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Writers are frequently confronted by the need of presenting 
umerical facts to their readers. Quantitative data concerning such 
natters as imports, earnings, products, population, historical statistics 
nd the like, are not infrequent in their occurrence, and it is, therefore, 
mportant to know what form of tabular, textual or graphic arrange- 
nent is preferable under different circumstances. The present study 
eports the results of an objective measurement of the effect upon 


everal thousand junior high school children of various arrangements of 
uantitative material. 

The procedure was as follows:? A short account of the economic 
istory of Florence was written. This unit was selected in order that 
he content might be equally unfamiliar to all the children. The nar- 
ative included one paragraph which contained specific quantitative 
acts. In each of the forms this paragraph was varied. The rest of 
he material was constant; and the paragraph in question varied only 
nh the form of presentation not in the data themselves.* In some of the 


1 One of a series of studies from the Lincoln School Social Science Laboratory, 
irected by Dr. H. O. Rugg. 

? Only a general outline of the procedure is given here. The detailed account, 
sproductions of all forms used, and a complete list of scores are on file in the 
ocial Studies Laboratory of the Lincoln School of Teachers College directed by 
\r. Harold Rugg. The material is open for inspection. 

*In this statement of the constant and variable features of the experimental 
haterial, simplicity is gained at some cost to accuracy. A more accurate state- 
hent of the case would be the following: The economic history of Florence was 
ritten in two parts, which were given both jointly and separately, the results 
eing compared. There was but one variable paragraph in Part I, and but one 
ariable paragraph in Part II. There were 15 forms of the variable paragraph in 

361 





362 The Journal of Educational Psychology 


forms the quantitative paragraph appeared as a statistical tab'e, n 
others as a bar-graph, in others as a pictograph or a line-graph and in 
still others the data were given in narrative form. 

Between 200 and 300 pupils were tested on each form. The tests, 
which were identical for all forms, consisted of two parts—the first 
pertaining to the story in general, and the second to the quantitative 
paragraph. The first part was used as a measure of the influence of 
non-experimental factors such as attention, reading ability, general 
intelligence, and the like. A method was devised by which these non- 
experimental factors could be discounted.! But in no instance is the 





Part I, and 14 forms of the variable paragraph in Part Il. The content of the 
variable paragraph in Part I was constant as far as subject-matter was concerned, 
but a few of the forms contained a larger number of items than the others. In 
Part II both content and the number of items were constant in all 14 forms of the 
variable paragraph. The content of Part I and Part II was, of course, different 
and many of the forms of presentation were different. But the line-graph, bar- 
graph, pictograph, paragraph, and statistical table appeared in both parts. 

1There were 12 questions which had no connection with the experimental 
(numerical) paragraph. The per cent of correct answers to each of these questions 
was listed and the average taken as the score of the non-numerical group (Group 1). 
(For fuller account of method of scoring see footnote to tables of scores.) 

Now suppose that form 16 scored 80 in Group 1 (non-numerical questions) and 
30 in Group 8 (recall of specific numerical facts) while form 17 scored 60 in Group 
1 and 25 in Group 8. Is the inferiority of form 17 to form 16 in Group 8 due to 
inferior intelligence, attention, etc.? Or is it due only to experimental factors? 
In other words, if the children who took the two forms had scored equally in 
Group 1 (questions concerning non-experimental factors) what would their 
relative standing have been in Group 8 (questions concerning experimental factors)? 
Since all scores were in terms of per cents of correct answers it was conceived that a 
simple proportion could be used to decide this question. These proportions could 
be worked out very quickly with a slide rule when all non-numerical groups were 
counted as 50 (a score lower than any of the actual scores). The resultant scores 
were called proportional scores—P Scores, and the initial scores are referred to as 
I Scores. In the suppositional forms 16 and 17, the P scores are formed as follows: 

(I Score Group 1):(P Score Group 1)::(J Seore Group 8):(P Score Group 8) 
The J Score of Group 1 is known and 50 is taken arbitrarily as the P score of Group 
1. The J Score of Group 8 is also known. The P Score of Group 8 is to be found. 
Substituting scores in the formula for form 16: 
80: 50::30:X 

X (P Score in Group 8, form 16) = 18.75. 


Again for form 17: 
60:50::25:X 


X (P Score in Group 8, form 17) = 20.83. 
Here the relationship between the two forms in Group 8 is reversed by using 
the P Score instead of J Score. But it should be noted that this was caused by an 
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interpretation of results as cited in the present article altered by this 
statistical method of discounting individual differences. It merely 
sharpened the differences indicated by the raw scores. A further 
equalization of such factors as social and economic conditions, school 
influence and the like, was attempted by a distribution of all forms in 
each school and, as far as possible, in each class. All the tests were 
given with a uniform procedure by two individuals. 

The story, designed to contain about as many facts as a somewhat 
dificult history assignment, follows: 


Part I 


How Our Mopern System or BANKING GREw Up 


In very ancient times—long before the year 1—there was money lending, but 
not until about 600 years ago did methods of money lending become anything like 
our modern banking methods. It is very important to know how people do their 
lending and borrowing. Without a system of banking like our modern system, 
world-wide trade, like our modern trade, would be impossible. We would probably 
have only little local shops, and people in different parts of the world would know 
very little of one another. 

The European system of banking helped probably more than any other one 
thing to spread western trade and civilization over the world. Perhaps if China 
had been the first nation to work out our present system of banking, Chinese 
civilization would have been spread over Europe and America, instead of the other 
way around. 


Wuere Dip tHe Mopern Banks Start? 


Our present system of banking has grown up from the clever system of money 
lending worked out by some Italian merchants in the city of Florence. 

Florence is in the Italian State of Tuscany, which is north of Rome, about 
half way between that city and the foot hills of the Alps. It is right on the main 





original difference of score in Group 1 of 20 points and a difference in the same 
direction in Group 8 of only 5 points. Scores bearing such a relationship to each 
other did not exist to any significant extent in the final results. Hence the P 
Scores were not used. But in the original preliminary scoring and working up of 
results, when there were less than 100 tests in each form, (in other words when the 
groups were far less nearly equivalent) the P Scores were used very successfully. 
The value of this method of discounting non-experimental factors was indicated 
by the fact that the results obtained by using the P Scores were closely parallel to 
the final results obtained by using many more children, i.e., by increasing the 
equivalence of the groups. 

The P Scores of all forms in all questions and groups are in the tables on file in 
the Social Studies Laboratory of the Lincoln School of Teachers College. 
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road from Northern Europe to Rome, at a place where that road crosses the Arno 
River. Many traders passed through Florence on their way to Rome; and their 
money helped Florence to grow. 

But it was not until the beginning of the 1200’s that the merchants of Florence 
were wealthy enough to begin thinking seriously about banking. It is in the year 
1200 that we first hear of a guild (or association of bankers) in Florence. 


How THE WOOL-DYERS OF FLORENCE HAPPENED TO Start BANKING 


From very early times there was a union of wool-dyers and merchants in 
Florence who called themselves the Calimala Guild. These Calimala merchants 
had to send buyers into foreign lands to get wool, because the sheep near Florence 
gave a poor grade of wool. From their buyers they learned most of the methods 
of dyeing cloth that were known in the world at that time, whether Turkish or 
Spanish, French or English. And soon Florentine dyed woolen-cloth was con- 
sidered the finest in the world. Then, of course, the foreign merchants who sold 
raw wool to Florence were interested in buying the dyed woolen cloth back from 
Florence; and the Calimala buyers became also sellers—or foreign agents. 

By 1150 the Calimala merchants had agents in nearly every country in Europe 
and even in the Orient. They bought raw wool in foreign countries and sold it to 
the wool manufacturers in Florence. After it was made into cloth the Calimala 
merchants bought it again from the manufacturers and sold it back to the foreign 
countries. So they sold each piece of goods twice—once to the wool manufac- 
turers and once in foreign countries. On each sale they made a profit. The wool 
manufacturers also profited—once—on every piece of cloth the Calimala mer- 
chants handled. Do you think that the business of the wool manufacturers grew 
along with the business of the Calimala merchants? Do you think it grew as fast? 

Cotton cloth was very little known in Europe in those days. Nearly everybody 
wore wool, and about the only other kind of cloth that was worn was silk, and that, 
of course, only by wealthy people. Silk cloth was also made in Florence. So the 
three big guilds of the city: The Calimala merchants, the wool manufacturers and 
the silk merchants, supplied a large part of the best clothing in Europe. Do you 
think this fact had anything to do with the fact that banking started in Florence? 
Would you expect banking to start where there was a surplus of money, or where 
there was very little money? Do you think the Calimala, wool, and silk guilds 
brought much money into Florence from other parts of the world? Which one of 
them do you think brought in the most? 


How THE CLOTHING GuILps Brouacut MONEY INTO FLORENCE AND THEN LOANED 
It Back To FoREIGN COUNTRIES 


Money came into Florence through many merchants and tradesmen, but the 
great clothing guilds more than any other single group of men, helped to make 
Florence the wealthiest city of Europe. 

Here is a table of numbers showing the yearly income of some of the leading 
members of the three guilds: 
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Calin als Mer- Wool manufac- Silk merchants’ 

chants’ income ie ok 2 : 
Year - , turers’ income esti- | income estimated 

estimated in U. S. , , 
mated in U. S. in U. 8. money 
money 
money 

1100 3 5,000 $1,000 ,000 $ 2,000 
1175 3 ,000 ,000 2 ,000 ,000 100 ,000 
1250 4,000 ,000 3,000 ,000 200 ,000 
1358 7,000 ,000 5,000 ,000 600 ,000 
1438 10 ,000 ,000 7,000 ,000 2 ,000 ,000 














Read the foregoing numbers carefully and notice which of the three industries 
grew more rapidly: the silk merchants, the wool manufacturers or the Calimala 
merchants. 

When the Calimala merchants grew so rich that they had more money than 
they could use in their own business, they began to lend their money at interest to 
other merchants in Florence. But other merchants in Florence were also growing 
rich, and also had money to lend; and pretty soon none of them could get very good 
interest from the borrowers. Can you explain why this was? 

As a result of this the Calimala merchants sent word to their agents (buyers 
and sellers) in foreign countries that they should offer to lend money whenever 
they saw a chance for a safe investment and good interest. Soon the merchants 
received so many offers, that they did not have enough money to meet them all. 
And the opportunities were too good to turn down. In many of the countries 
money was very much needed and interest was high. So the Calimala merchants 
borrowed money in Florence from merchants who did not have wide foreign 
connections. They borrowed at a low rate of interest in Florence and lent at a 
high rate abroad. The difference they kept for themselves. This was the 
beginning of the great Florentine banking houses. 

The business of borrowing and lending money soon became so large that it 
could no longer be handled by the same agents who handled the woolen business. 
So separate banking houses were started, bank representatives were sent to 
England and other countries, and a guild of bankers wasformed. This took place, 
as you have already learned, about the year 1200. At that time, in Florence, our 
present banking system is said to have been born. 


Part II 


Wuat Errect Dm tHe GrowtTH or Banks HAVE UPON THE GROWTH OF 
FLORENCE? 


The banks of Florence grew all through the 1200’s and 1300’s and the early 
part of the 1400’s. By 1422 Florence was considered the wealthiest city in Europe. 
But shortly after that date (in 1444) one of the bankers managed to get most of the 
money into his own control. This was the financial genius Cosimo de’ Medici 
(pronounced de Medichee). He not only controlled the money of many merchants, 
but the public funds as well; and by clever trickery he got almost complete control 
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of the government itself. Then he set about to ruin all of the powerful men of 
Florence whose money he did not control. 

After Cosimo de’ Medici managed to get the government of Florence into his 
own hands in 1444, the city was never able to get rid of the power of the Medici 
family until that family died out several centuries later. What effect do you 
think the growth of banking and foreign trade had upon the growth of the popu- 
lation of Florence? And what happened to the population after the Medicis (in 
1444) got control of most of the money in the republic? 

Here is a picture that will help you answer these questions: 


Size of the population of Plorence: 









































1100 1250 1336 1478 1574 
oO 
Be od i a 
te See ne OR 
65,000 130,000 180,000 71,000 50,000 
people people people people people 


In 1530 the republic of Florence which had lasted for centuries, was done away 
with, and the de’ Medici family were made hereditary rulers of the city and its 
possessions. Of course, they had really been rulers since 1444, but after 1530 the 
people could not even pretend to govern themselves anymore; and if anyone said 
anything against the de’ Medici that person was hanged for disloyalty to his 
country. 

After that the great bankers of Europe lived in other countries. 


In the foregoing story, form 6 (a statistical table) has been used in 
Part I and form .013 (a picto-graph) has been used in Part II. This 
pictograph is the only form of Part II that produced results which 
differed at all markedly from the results of the corresponding forms in 
Part I. Most of the forms of Part II were variations of the statistical 
table which produced very slight differences in score and are not dis- 
cussed in this paper. Hence form .013 is the only form of Part II 
which is reproduced. 

However, all the forms of Part I, which were used are reproduced 
below. They were inserted in place of the statistical table which 
appears in the reproduction of the story. 


Form I 


The income of the leading Calimala merchants grew very much between 1100 
and 1438. In 1100 their income probably did not amount to more than $5,000 
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(estimated in U. 8S. money). But in 1438 they earned more than $10,000,000. 
The income of the wool manufacturers grew almost as rapidly (although it was not 
as large) as that of the Calimala merchants. In the year 1100 the wool manu- 
facturers earned a little over $1,000,000. In 1438 they earned about $7,000,000. 
The income of the silk merchants (though much smaller) also grew: in 1100 it was 
only about $2,000 while in 1438 it was over $2,000,000. 

Read the foregoing paragraph carefully and notice which of the three industries 
grew more rapidly: the silk merchants, the wool manufacturers or the Calimala 
merchants.! 


Form 2 


The income of the leading Calimala merchants grew very much between the 
years 1100 and 1438. In 1100 they earned only about $5,000 (estimated in U. S. 
money); but by 1438 they earned yearly about $10,000,000. The earnings of the 
wool manufacturers grew from $1,000,000 in 1100 to $7,000,000 in 1438. 


Form 3 


Form 3 was identical with form 2 except that numbers were written ‘‘5 thousand 
dollars,’’ etc. 
Form 4 


Form 4, also, was identical with form 2 except that numbers were written “five 
thousand dollars,”’ etc. 
Form 5 


Same as form 6 except that it is followed by these questions: 

Before reading further see if you can answer these questions: Which one of the 
three industries earned the most in the year 1100? Which one earned the most in 
1438? In 1100 how much more did the Calimala merchants earn than the silk 
merchants? How much more did they earn in 1438? How much more did they 
earn in 1438 than they did in 1175? How much more did the wool manufacturers 
earn in 1438 than they did in 1175? How much more did the silk merchants earn 
in 1438 than in 1175? Did the banks, which started in the year 1200, cause the 
earnings of the Calimala merchants to become greater than the earnings of the 
wool manufacturers? 











Form 6 
Calimala mer- | Wool manufac- it: wentinates 
_— chants’ income turers’ income esti- sentiments 

estimated in U. 8. mated in U. S. tt O ieln 

money money "ae y 
1100 $ 5 ,000 $1,000 ,000 g 2 ,000 
1175 3,000 ,000 2 ,000 ,000 100 ,000 
1250 4,000 ,000 3 ,000 ,000 200 ,000 
1358 7 ,000 ,000 5,000 ,000 600 ,000 
1438 10 ,000 ,000 7 ,000 ,000 22 ,000 ,000 














1 The instruction: “Read the foregoing,” etc. followed all the forms but is 


reproduced here only in connection with form 1. 
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Form 7 
ee mer- Wool manufac- ; Si wisiichainte 
Veen cl ants income turers’ income esti- iiebine sétiinted 
estimated in U.S. mated in U. 8. : 
in U. S. money 
money money 
1100 $ 5 ,622 $1,342,768 $ 2,340 
1438 10 ,672 ,439 7,478 ,924 2 ,365 ,337 
Form 8 
| Calimala Wool manufac- : 
“ap an ‘ Silk merchants 
_—_ | merchants’ income | turers’ income esti- haniates étiieenbad 
estimated in U. S. mated in U. S. ‘ 
in U. S. money 
money money 
1100 $ 5,000 $1,000 ,000 $ 2 ,000 
1438 | 10 ,000 ,000 7,000 ,000 2 ,000 ,000 
Form 10 


Same as form 11 except that it is followed by the questions which also follow 
form 5; and are reproduced in connection with that form. 


Form 11 





YEAR 


1100 


1175 


1438 





GUI 
CALIMALA $ 5,000 
WOOL $1,000,000 
SILK 3 = 2,000 
CALIMALA 33,000,000 
WOOL $2,000,000 
SILK $ 100,000 
CALIMALA 310,000,000 
WOOL $7,000,000 
SILK $2,000,000 

















Form 12 
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YEAR AMOUNT OF INCOME (estimated in U.S. money.) 





SILK 


vooL 





1100 $ 2,000 
1175 $ 100,000 
1438 $2,000,000 


1100 $1,000,000 Mi 
1175 $2,000,000 
1438 $8,000,000 [inne 


1100 § 5,000 


1178 $3,000,000 SE 
1438 $10,000, coc 





























Form 13 
Year: Year: Years: 
1100 1175 1438 1100 1175 1438 1100 1175 
CALINALA 




















Form 14 
YEAR YEAR YEAR 
1100 1175 1438 
woo. 
1,000,600 
oe e. 
2,000 6,000 
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Scores are in terms of percentages and average percentages. 
Questions were tabulated separately, and the per cent of correct 
answers to each question is the score of that question. Group scores 
consist of an average percentage of a single form in a group of questions: 
Thus when form 15 is listed as having a score of 81 in Group 7 (dynamic 
comparisons) it means that 90 per cent of the children who studied that 
form (a line graph) answered question 1 in the test correctly, 70 per 
cent answered question 3 correctly, and 83 per cent answered question 
4 correctly. Questions 1, 3 and 4 constitute Group 7; and the average 
of 90 per cent, 83 per cent and 70 per cent is 81 per cent. Most of the 
groups were composed of from five to eight questions. 


OUTLINE OF CONCLUSIONS 


The following is an outline of the questions which the present 
experiment undertook to answer and such answers as were found. 

1. Does it make much difference how quantitative data are 
arranged provided they are put together in some coherent form? 
In other words, is the choice of graphic, tabular or textual form merely 
a matter of taste or is it of some moment to the learner? 
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The answer to this question was the expected one: The forms in 
which complex numerical data are presented make a decided difference 
in the resultant learning. For example, of the 226 children who studied 
form 15, a line graph, less than 3 per cent answered question 12 in the 
test correctly, whereas 38 per cent of the children who studied identical 
data in tabular form, form 6, answered the same question correctly. 
That this was not due to greater familiarity with one form than with 
the other was indicated, as will be shown below, by the fact that in a 
different type of question the relative effectiveness of the two forms 
was reversed. 

2. Should the number of items determine the arrangement of 
items? For example, should a bar-graph be used when there are but 
few items to be presented, a table when there are a greater number of 
items and a line graph when the number of items is yet greater? The 
answer to these questions was fairly clear: 

Above a certain quite low limit of complexity! increase in the number 
of data did not affect in any way the relative effectiveness of the various 
forms. For instance, if a statistical table is appropriate for 10 data, 
it is equally appropriate for 15 or 20 data. Below the limit of com- 
plexity referred to, the test did not differentiate clearly between the 
various forms. But the tendency seemed to be the same for the most 
meagre data as for the most complex. (‘‘ Within the limits of this 
experiment”’ should be understood to follow each of the statements 
made here.) One difference between the simple and complex data 
should be noted: The pictograph seemed a fairly effective form of 
arrangement for the simplest data, whereas it was very ineffective 
for the more complex data. 

3. Which affects more vitally the recall of quantitative data—the 
quantity or the arrangement of the data? 

The answer to this question is twofold: 

(A) In the matter of specific amounts, both quantity and arrangement 
of data affect recall markedly and consistently. The smaller the quantity 
and the simpler the pattern the beiter the recall of specific amounts. 





1 Complexity here means not the number of items involved but the number of 
comparisons which those items call for: For instance 11 items calling for the com- 
parison of the earnings of three guilds at two different dates was apparently more 
difficult (complex) than 11 items showing the population of Florence at five differ- 
ent dates. Rather large units were counted as single items: The name of Florence 
as one item, the five dates as one item each, and the five figures of five and six 
places each as one item each. 
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(B) In the matter of relative amounts (i.e., static comparisons—a 
phrase which is later explained) arrangement of data is of primary 
amportance, and differences in quantily of data (within the limits of this 
experiment) have litile effect upon recall. For instance, the recall of 23 
items in tabular form (form 6) was, in respect to relative amounts, 
superior to the recall of 11 items in paragraph form (form 1). 

4. Is the difference in effectiveness of the forms due to their visual 
differences or to differences in logical arrangement? For instance, 
would the effectiveness of numbers which are grouped in a certain way 
be altered by the addition of bars or pictures or lines, without changing 
the grouping of the numbers? 

Ezxplanation.—It might be well to explain further the distinction 
made here between logical and visual arrangement. Usually a dif- 
ference in visual pattern involves a difference in the grouping of 
numbers. Unfortunately the simple experiment suggested in the 
question above was not made; but an answer to the question was 
arrived at through a comparison of two bar-graphs and two picto- 
graphs; form 11 is a bar-graph with the earnings of the various guilds 
grouped according to dates. Form 12 is a bar-graph with the earnings 
in the various dates grouped according to guilds. Both are bar-graphs 
and hence visually similar. Identical data are presented in both, but 
the groupings or logical arrangements are dissimilar. In the picto- 
graphs, forms 13 and 14, the groupings (logical arrangements) cor- 
respond with forms 12 and 11 respectively; but the visual patterns are 
very dissimilar. From a comparison of these forms the following 
conclusion was reached: 

Both logical and visual arrangement of data have an important effect 
upon learning; the logical arrangement is the more important in respect 
to the recall of relative amounts; and visual pattern is the more important 
in respect to the recall of specific amounts. 

5. Can the forms be given a rank order for general effectiveness? 
That is, is there any form which is more effective in all respects than 
any other form? The answer to this question is simply “no.’’ On 
the other hand a certain negative rank order might be made: Since in 
presenting moderately difficult data the paragraph and pictograph are 
surpassed by one form or another in all respects, and since even in the 
simplest data the pictograph is not significantly better than and the 
paragraph is inferior to other forms, it may safely be said: When in 
doubt about the difficulty of the data, never use a paragraph or picto- 
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graph. In other words these two forms, in a general sort of way, rank 
lowest. A more general and accurate answer to the question is: 

The different groupings and visual arrangements of data foster the 
learning of different types of fact. 

Explanation.—Analysis of the test results showed that a form often 
produced superior scores in one group of questions and inferior scores 
in other groups. Further analysis showed that the questions in each 
group had some aspect in common. All the questions in one group 
dealt with specific amounts earned (e.g., ‘How much did the wool 
merchants earn in the year 1100?’’). All the questions in another 
group dealt with relative amounts earned at specified times; such 
questions are referred to throughout this study as séatic comparisons 
(e.g., ‘‘ Who earned the most in the year 1100, the wool, silk or Calimala 
merchants?’’). All the questions in a third group dealt with relative 
increase, decrease or fluctuation; such questions are termed in this 
study, dynamic comparisons (e.g., ‘‘Between the years 1100 and 1438 
whose earnings increased most rapidly, those of the wool, silk, or 
Calimala merchants?”’). In short, the questions concerning quantita- 
tive facts fell into three groups: 1. Specific amounts. 2. Static com- 
parisons. 3. Dynamic comparisons. Each of these groups of facts was 
most effectively conveyed by a particular form—different in each case. 

Another significant fact should be noted: There was no correlation 
between scores obtained in specific amounts and those obtained in static 
and dynamic comparisons; nor between the latter two; nor between any 
of these and the scores obtained in non-numerical questions. 

The final question which the present experiment set out to answer 
was: 
6. Is it possible to establish any general rules for the appropriate 
use of the various graphic, tabular, and textual forms? 

The answer to this question was in the affirmative. The rules, 
in terms of the present experiment, follow: 

I. For complex or slightly complex static comparisons, use a bar- 
graph. 
II. For extremely simple static comparisons use a pictograph. 
III. For dynamic comparisons use a linegraph. 
IV. For specific amounts use a statistical table. 
V. For specific amounts use round numbers in numerical form 
(e.g., “5,000” not “5,622” nor “five thousand”). 
VI. For specific amounts use as few facts as possible. 
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VII. Never present numerical data in textual (paragraph) form 
af there are more than one or two items to be presented. 

VIII. When numerical data are presented textually, use written 
numbers (e.g., five thousand dollars) for static and dynamic comparisons; 
and numerals (e.g., $5000) for specific amounts. 

IX. Use questions after a graph to emphasize its chief features. 

Discussion.—Some of the foregoing conclusions are based simply 
upon the superiority of some forms over all their competitors in 
particular groups of questions. Such conclusions need no explanation 
other than the scores upon which they are based. In several instances, 
however, the conclusions have been reached by the use of more diff- 
cult comparisons. Such conclusions are discussed in the second part 
of this article. 


(To te Concluded in October Issue.) 
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A SECOND STUDY OF MENTAL DISCIPLINE IN HIGH 
SCHOOL STUDIES! 


CECIL R. BROYLER, E. L. THORNDIKE, AND ELLA WOODYARD 
Institute of Educational Research, Teachers College, Columbia University 


In the January and February 1924 issues of this Journal, we 
reported the facts concerning the gains in intelligence score during 
a year made by 8564 high-school pupils, according to the studies taken 
during the year. The general results were that the amount of gain 
bore only a slight relation to the studies taken. The bright gained 
more than the dull, and the white pupils more than the colored; but 
pupils who took, say, Latin, geometry, English, and history gained 
little more than pupils of equal intelligence who took arithmetic or 
bookkeeping, cooking or sewing, English, and history. 

Our procedure was first to discover by a rough method certain 
studies which were of about average influence, and then to compare 
students who took any given study with students of equal intelligence 
who took one of these studies of average influence (or nothing) in 
place of it. For example, representing one of these studies of average 
influence by I, representing civics, economics, psychology, or sociology 
by II, representing biology or agriculture by III, representing arith- 
metic or bookkeeping by IV, and representing geometry, algebra or 
trigonometry by V, we compared the gain of pupils taking I, II, III 
and IV with the gain of pupils taking V, II, 1IIandIV. The influence 
of taking V is thus compared with the influence of taking I. We also 
had certain comparisons between pupils who took V and pupils who 
took nothing in place of it (for example, I, II, III, V with I, II, III). 

Our studies of “‘about average influence” (which were in fact a 
trifle below average influence) were business, drawing, English, history, 
music, shop, and Spanish. The symbol I represents any one of these. 

The difference in gain between a pupil taking a given subject and 
a pupil of the same sex and the same ability in the initial test of intelli- 
gence who took I or nothing in place of it, was as follows: 


For arithmetic or bookkeeping... ..........-++++++eeee- (IV) +2 .92 
For chemistry, physics or general science...........++++- (IX) +2 .64 


For algebra, geometry or trigonometry...........+-e0e+- (V) +2.33 





1 This investigation was made possible by a grant from the Commonwealth 
Fund. 
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Den Rien a eee i coc df ci A kit eh (V1) +1.64 


Ur IN GOI ov o's bak Sh i oie LS Re os (T) + .66 
For civics, economics, psychology or sociology............ (II) + .27 
RE Re ee eee aS Cee Ce ane eee (D) — .29 
For cooking, sewing, stenography......................- (VIIT) — .47 
For biology, zoology, botany, physiology or agriculture.... (III) — .90 
I Is, ce daneg oa de Mic adeeesscevbeeee (M) Insufficient data 


The unit is a little over 4 per cent of the average gain from the 
first test to the retest. It is about one-tenth of the difference between 
the gain in a year of an average white pupil in high-school and the gain 
in a year of an average colored pupil in high-school in a western city. 
The superiority in gain due to taking the highest study over taking the 
lowest is thus about two-fifths of the superiority in gain during one 
year of the average white over the average colored high-school pupil. 
Expressing the above differences as deviations from their own average 
we have: 


For arithmetic or bookkeeping......................... (IV) +1.94 
For chemistry, physics or general science................ (1X) +1.66 
For algebra, geometry or trigonometry.................. (V) +1.35 
eR =e IS RR (VI) + .66 
rs trie aa wa seh acle oe 46 he ¥.04.56004 (T) — .32 
For civics, economics, psychology or sociology............ (II) — .71 
LY, Dos oc didl od allt ede wwis.oele od Mibbes (D) —1.27 
For cooking, sewing, stenography.. . jams ® . (VIII) —1.45 
For biology, zoology, botany, physiology © or > agriculture... . (IID) —1.88 
i i a a Ahi Mae ah uta a6 6 41605 (M) 


The enormous practical importance of these facts, if they are true, 
made it highly desirable to repeat the experiment with other individ- 
uals. A grant from the Commonwealth Fund enabled the Division 
of Psychology of the Institute of Educational Research of Teachers 
College to do this. 

The same intelligence examinations (the I. E. R. Tests of Selective 
and Relational Thinking, Generalization and Organization, Forms A 
and B, described in Vol. V, No. 4 of the Journal of Educational Research, 
April, 1922) were used. At the opening of school in September, 1924, 
Form B was given to the pupils in Grades IX and X of City 2. Near 
the end of the term, in May, 1925, Form A was given to the same 





1 Latin is the chief component of VI. In a sampling which we took, VI means 
Latin in 1141 cases and French in 460, the proportion being as 71 and 29. We have 
also evidence (presented later in Note A of this report) that the superiority of Latin 
over French in producing gain in score in the tests of intelligence is approximately 


zero. 
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pupils, except those who had left school in the meantime. At the 
end of the school year 1924-1925 Form A was given to the pupils then 
in Grades X and XI in City 1 and a year later Form B was given to so 
many of them as were found in Grades XI and XII. A record was 
kept of the studies taken by pupils in each city during the year which 
elapsed between the first and the second examination. As a result, 
we had available records of the gain in test s¢ore during the year and 
the program of studies for a little over 5000 pupils. 

We took over from the work of the earlier study the use of business, 
drawing, English, history, music, shop, and Spanish as a group of 
studies any one of which was of about the same influence upon gain 
in intelligence score as any other. Any one of these was entered as 
0 in a student’s program. The symbols II to IX, D, M and T have 
the same meanings as in the earlier study and as shown above (except 
that there were no students of agriculture in these cities, so that III is 
for biological subjects proper). The symbol X stands for physiog- 
raphy. A course in R. O. T. C. was recorded under T. 

Following the procedure described on pages 84 and 86 of the first 
report (this Journal, Feb., 1924) we obtain differences in gains between 
pupils whose programs were similar save that they took I or nothing in 
place of that subject, as shown in Table XVII.! 

Table XVII corresponds to Table XII of the first report. Table 
XVII compares the gain of the pupils who did take any subject with 
the gains of other pupils who took I or nothing in place of it. If those 
taking it and those taking I or nothing in place of it were identical in 
all else save the difference in program, Table X VII would give measures 
of the comparative influence of the studies in question. But they were 
not. They differ in sex and in intellect as measured by the examina- 
tion in question. Since what we wish to know is the superiority or 
inferiority of one subject to another if the two subjects were taken 
by identical groups, we must make corrections to allow for any tend- 
encies of either sex to make a gain in general intellect during the 
year, and to allow for differences between the bright and the dull in 
such gain. 

In the data of 1922-1923 used in the first report there was a 
slight superiority in such gain correlated with maleness and correction 
was made for it. In the 1925-1926 data, being a boy rather than a 
girl taking the same group of studies is found to involve in the average 





1 To avoid possible confusion we begin numbering the tables of this report with 
XVII. 
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a difference in gain of —.45 of a point. This difference is computed 
from the following: 


I. In study groups where n>50 the weighted mean B-G difference 
is —.75, with a weight of 584. 

II. In study groups where n< 50 the weighted mean B-G difference 
is —1.21, with a weight of 84. 

III. In a random selection of pupils not used in I or II, the weighted 
mean B-G difference is +.77, with a weight of 187. 


TasBLe XVII.—Tue DirrERENCE IN GAIN BETWEEN Pupits TAKING ANY GIVEN 


SusBJEct AND THosE Wuo Take I or NOTHING IN PLACE or IT 





Weighted average 








Subject 
Difference Sum of weights 

SI sss in wie cadben need Seka 4.33 1475 
MED, dean sc ceeeeneence eda « .46 1074 
lee a 1.84 864 
V. Algebra, geometry, etc.............. 3.93 1636 
~~ aa eer 1.04 1626 
ES PRI, occ cccndcccccecsrece —0.09 1709 
Be TD occ ccectasesscekwoes bane 3.15 1173 
Si GR ccs cacvascecsebsene 46 131 
M. Manual training................... .94 321 
We IEE. 5 oc cavicvccccanceess — .06 1558 
ie EE 0 ccc nen ecnscdoetes 2.83 265 











Because this difference is very small compared to the variability of 
the gains of the entire population tested, and because the result 
obtained in the earlier study was larger in amount and opposite in 
sign, no corrections were made for sex in the case of the 1925-1926 
data. 

At the time of making the first report we made two determinations 
of the greater or less amounts of gain to be expected from pupils of 
greater or less initial ability, if they had taken identical programs. 
One was by the median correlation between the sum of the intelligence 
scores and gain (+.09) in groups of each which contained only pupils 
of the same sex, taking the same programs.' The other was by the 
fact that the colored pupils of City 1 with a median initial score of 





1The word same here means ‘‘same in terms of the I, II, III, IV, . . . T 
classification of page 377.” 
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117, made a median gain of 13.5 whereas the white pupils of the same 
city, with a median initial score of 192, made a median gain of 23.5. 
By the former, a difference of 1 point of initial score implies a difference 
of .05 point of gain; by the latter it implies .133 point of gain. In the 
first report, we used.05 as a very conservative allowance. 

We have made an independent determination of this correction for 
superiority or inferiority in intellect from the new 1925-1926 data, 
obtaining .085 as the allowance per point of difference in initial score. 
The essential facts of the determination are as follows: 

In the determination of the correction to be made for initial 
score, a table was prepared similar to Table XIV of the 1922-1923 
study, but using the 1925-1926 data; and the correlations between gain 
and sum of initial and final scores were calculated. The median 
correlation was the same as in the previous study. Using this with 
the variabilities of gain and initial score there reported, the 
allowance of .05 was again obtained. 

For some of the pupils there were available scores on a vocabulary 
test and on a paragraph reading test given to the pupils of one school 
in the spring of 1924. These people were then grouped into nine 
groups according to the magnitude of their paragraph reading 
scores. The mean initial scores and gains on the selective and general 
test were then calculated for each of these groups. The equation of 
the straight line of best fit representing gain as a function of 
initial score was then calculated; the amount of difference in gain 
corresponding to a difference of one unit in initial ability is, according 
to it, .09. 

There were, then, four independent determinations of the allow- 
ance for differences in initial score: .05, .05, .09 and .13. From weight- 
ing these results, Mr. Brolyer and Dr. Thorndike independently 
estimated the allowance to be used. One arrived at a result of .08, and 
the other at .09. The value finally determined upon was the average 
of these two, or .085. 

We have therefore added .085 point of gain for each point of initial 
score below 180 and subtracted .085 of gain for each point of initial 
score above 180, so as to transmute our determinations of the gains of 
all the pupils into the probable gains which certain uniform intellects 
would have made if they had taken the program in question. We have 
then computed the facts of Table X VIII, which corresponds to Table 
XVII, but is what is to be expected from pupils, all of equal initial 
ability, by the allowance stated. 
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The net final results of the two investigations are then as shown in 
Table XIX. They differ about to the extent that was indicated by 
the estimated PE’s of the 1922-1923 determinations. They agree in 
finding that the influence of taking one study rather than another 
upon the gain in intellectual power as measured by the test used was 
very small. The average for V (algebra, geometry or trigonometry) 
and VI (Latin or French) is only 1.89, which is but 1.81 greater than 
the average for VIII (cooking, sewing, stenography or typewriting) 
D (dramatic art) and T (physical training or R. O. T. C.), for persons 
of the same sex and initial ability. They agree in finding that certain 
studies select the abler intellects rather energetically. The pupil 
who takes Latin or French averages many points higher in initial 
score than the pupil who takes I or nothing in place of the Latin or 
French. The same is true to a less degree of algebra, geometry and 
trigonometry, and of the physical sciences. The reverse is true of 
cooking, sewing, stenography and typewriting, and of bookkeeping. 


TasLe XVIII.—TuHeE DIFFERENCE IN GAIN BETWEEN A Pupit TAKING A GIVEN 
SUBJECT AND A PUPIL OF THE SAME INITIAL ABILITY WuHo Takes I or 
NorTHING IN Puace or It: Data or 1925-1926 














Corrected Sum of weights 

Subject weighted average | used in the cor- 
difference rection 
Se Es Gea ce scat beau skaaw enn s 5.50 1475 
Se ICME, acvevacecisceeesawesee .60 1074 
RW Rs BOR s 5g 058 88d oa ie ce ives 2.28 864 
V. Algebra geometry, etc.............. 3.64 1636 
i AR a 5 cn nove na.ees ue cnet — .07 1626 
ss 5.0 nn a6 050 Keen aan 19 1709 
EE, ono, son Gs cea es kame 2.77 1173 
ey EP I Shs. REO oe hed — .67 131 
eer rrr 1.86 ~ 321 
T. Benes GEREN... 5 6 6000:6 ns cise ne c’s 1.00 1558 
i I so sod kin an we Siem 3.38 265 





They agree in disagreeing with the traditional doctrine that Latin, 
algebra and geometry are the prime disciplinary subjects of the high- 
school. The average for Latin, etc. and algebra, etc. is lower than the 
value for physical science in both series for persons of the same sex and 
initial ability. 
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Besides repeating the earlier investigation with these 5000 new 
individuals, we have utilized both the data of 1922-1923 and those of 
1925-1926 by a different method. This method consists of collecting 


two groups of students. 


The first comprises those who had programs 


with a maximal amount of Latin (or French), algebra, geometry or 
trigonometry, and physics, chemistry or general science, and no 
“commercial” or “‘manual” subjects, that is, no stenography, type- 


TaBLE XI X.—SumMarRyY OF DIFFERENCES IN GAINS BETWEEN Poupits TAKING 
A GIVEN SuBJEecT AND Pupits TAKING I or NoTuineG IN Puace or It: Data 
oF 1922-1923 anp 1925-1926 





Obtained differences 
in gains in compari- 
son with I or nothing 


Estimated differ- 
ences in gains in 
comparison with I or 
nothing, for persons 











Subject of the same sex and 
ability 
1922- | 1925-/| Aver- | 1922— | 1925-/| Aver- 
1923 | 1926 | age | 1923 | 1926 | age 
II. Civies, ete....... 000. .ceeee. 50, 4.33 2.42; .27| 5.50| 2.89 
BEE. Be MOR ccc cccsccccsess — .85 .46;— .20|— .90 .60)— .15 
IV. Arithmetic, etc.............. 2.73) 1.84) 2.29) 2.92) 2.28) 2.60. 
V. Algebra, geometry, etc....... 3.10; 3.93) 3.52) 2.33) 3.64) 2.90» 
Villy: DEED ciconccevesauns 2.21; 1.04 1.63) 1.64/-— .07 .79 
VEER GI COR 2. cess ee sirs «ovine —1.51;— .09}/— .80|— .47 .19}— .14 
ee ere ee re 3.48} 3.15) 3.32) 2.64) 2.77) 2.71 
IT —1.23 .46}— .39}— .29|— .67|— .48 
M. Manual training............. — .35 .94 .30 
T. Physical training............. .24|— .06 .09 .66; 1.00 .83 
De, FRG Sai Vad sewal, cee | S £0 o. . wathwrwas 3.38 




















writing, bookkeeping, accounting, business practice, business organi- 
zation, business law, commercial geography, salesmanship, cooking, 
home economics, domestic science, dramatic art, manual training, 
drafting, crafts, shop work, sewing, costume design, millinery, or 
dressmaking. The second comprises those who had programs with 
maximal amounts of commerical or manual subjects, and with no VI 
(Latin or French), V (algebra, geometry, trigonometry), IX (physics, 
chemistry or general science), and also no III (biology). 





1 Including the domestic arts. 
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The first group is subdivided into groups with programs as follows, 


V, VI, IX and a subject or subjects not commercial or manual.! 


V, VI and a subject or subjects not commercial or manual. 
V, 1X and a subject or subjects not commercial or manual. 
VI, IX and a subject or subjects not commercial or manual. 
V and a subject or subjects not commercial or manual. 
VI and a subject or subjects not commercial or manual. 
IX and a subject or subjects not commercial or manual. 


The second group is subdivided into groups with programs as 
follows: 
Commercial subjects and a subject or subjects not III, V, VI, or XI. 


Manual subjects and a subject or subjects not III, V, VI, or IX. 
Commercial and manual subjects and a subject or subjects not III, V, VI or IX. 


These groups will be referred to hereafter as V VI IX, V VI, V IX, 
VI IX, V, VI, IX, C, M and CM. 
For certain reasons we also sorted out and studied four groups with 
programs as follows: 
III and a subject or subjects not commercial or manual. 
III, V and a subject or subjects not commercial or manual. 


III, VI anda subject or subjects not commercial or manual. 
III, V, VI and a subject or subjects not commercial or manual. 


These groups will be referred to hereafter as III, III V, III VI, 
and III V VI. If a student’s program had both a III and a IX, it 
was allotted to the IX group. 

The detailed nature of the composite program for each of these 
groups is shown in Tables XX and XXI. We have a range from 
groups who are spending half their time on mathematics, Latin (or 
French) and physical science to groups who are spending half of their 
time on stenography, typewriting, bookkeeping and the like. The 
balance of the programs is much alike. In no case is there any mix- 
ture of V, VI, or [IX with any commercial or manual subjects in the 
same program. 

We have computed the average initial score and average gain for 
each of these groups. In order to provide a convenient rough estimate 





1 The subject of arithmetic was treated not exactly as an indifferent subject, but 
in a specific way. If arithmetic occurred in a program which had V, V VI, or 
V VI IX, that program was left in the group. If it occurred in a program which 
had commercial courses that program was left in the commercial group. Other- 
wise a program which had arithmetic was not used in any of the groups mentioned 
above or in the III group mentioned later. 
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of reliabilities, this is first done separately for City 1, 1922-19231 
City 2, 1922-1923, and Cities 1 and 2, 1925-1926; and the three sets of 
data are then combined. The results appear in Table XXII. If the 
gains of Table XXII are corrected for differences in initial ability by 
an allowance of +.05 for each point of initial score below 180, and of 
—.05 for each point of initial score above 180, we have the estimates 
of Table XXIIIA. If the allowance is .085 instead of .05, we have the 
estimates of Table XXIIIB. With this material we have nowhere 
made any allowance for sex. 


Taste XXII.—Tue Averace Init1au Scores AND AVERAGE GAINS oF GROUPS 
TAKING PROGRAMS ABOUNDING IN LATIN OR FRENCH, MATHEMATICS, PHYSICAL 
Science, BroLocica, Scrence,wita No CoMMERCIAL OR MANUAL 
SuBJEcTs AND OF Groups TAKING ProGRAMs ABOUNDING IN Com- 
MERCIAL OR Manvat Sussects witu No III, V, VI, or IX 


















































City 1, City 2, Cities 1 and 2, All 
1922-1923 1922-1923 1925-1926 
Program | 
In- | Gain) In- . In- , In- : 
D | tial D | tial iat itial Gain| n itial Gain 
| | ae ea 
Plcdacviegaéwbavadeces 41\187.42 21.30) 30 182 .87/21.95 104|187.58 19.74) 175| 186.73 20.49 
Vesdsidrevsisioessovewe 140/172 .41\24.08/124 189.99] 17.51|377|176.08|26.32 641/177 .97 24.13 
, a 79 208 .10\23.33 60/206. 41/23.68 83'211.07|23.48) 222'208.75 23.48 
Psabvohiesss sacceqeaed 104/202. 29/29. 10 57 206 .02/21 .66 132/214. 36 24.82 293/208 .45'25.73 
BEE Widdsicshed costae; 69\177.72\22.35) 43 194.84/21.06 139 190. 8624.75 251/187 .93/23.46 
eb ned cond weeess dons 68,191.79|18.93) 45 204.91/21.70 88 206.3618.41) 201'210.10|19.32 
, ferry 376/207 .62/25.38 178 201.22 25.68 656 189. 22/29 .62)/1210)196.70|27.73 
W Meg rescoeeesethotes 155/200. 26'31.38 184|212. 62/21.22 304/191. 72 28.13) 643\199.73\26.94 
We eeieusecenas + chowet 99\219.70/29.16| 71\217.67|25.67,113\207.50\22.62| 283,214.32/25.67 
BE Wdedeteseceeens 123/186 .02|26.95 56|210.20/23.51| 74/203 .42/24.56|) 253/196.46/25.49 
VW We Beth bedass's daeave 103/208 .44|32.48 140 209 60/28. 19,212 200 .86|26.09) 455 205.27\28.18 
Commercial............. 280)173.82)17.72 211/181. 46/18. 37 335) 183.38/17.55) 826)179.65/17.82 
Pe skkvascecenees 46, 182.36)15.90) 24:174.00/24.77|122 159.08/18.34) 192'166.52/18.56 
Commercial or manual... ss mane 17.83 ‘aa wana wanes 242/175 .35 18.26; 548)174.13)18.26 
| | 








Let us now see how the results by this method compare with those 
by the method used in the main inquiry and reported in Table XIX. 
Let us begin with the relative gains due to a III vs. a IX, to biology 
vs. physics or chemistry or astronomy. By Table XIX, a IX had a 
superiority of 3.52 points of gain over a III, in the uncorrected aver- 
ages, and 2.86 in the corrected averages. By Tables XXII and XXIII, 


1 “City 1, 1922-1923” here and elsewhere means all the cities and towns of the 
1922-1923 investigation except City 2. 
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Taste XXIII.—Tue Estimatep Averacre GAINED FOR PUPILS OF AN ABILITY 
REPRESENTED BY AN IN1TIAL Score or 180, 1F Sucn Pupits SHoutp Take 
THE PROGRAMS AS SPECIFIED FOR Groups III, V, VI, Erc. 1n Tastes XX 








AND XXI 
A. Gain accord- | B. Gain accord- 
Group n ing to the allow- | ing to the allow- 
ance of .05 ance of .085 

DGibisGsekbetsne dd bihacaeen 175 20.15 19.92 
i kaced ee scans eee kete wes 641 24.23 24.30 
eh eds Maco Dale oa 222 22.04 21.04 
cuca eked beece sak bees 310 24.31 23 .31 
og. SEES ayy ee eee Pe 251 23 .06 22.79 
UE: cits anes aceedetecee Bes 200 18.28 17.54 
DERE RRS A ghee Ces ay! yt 1210 26 .90 26.31 
bivctaceskees buh vee ahaa 543 25.95 25.26 
kde hVantenndeseacs eee 283 23 .95 22.75 
RES Ey er 253 24 .67 24.09 
er Wa eae ey 455 26 .92 26 .03 
BIN. ob dnc cddcecsacdes 826 17 .84 17.85 
hak is eat aire a 192 19.23 19.71 
Commercial or manual......... 548 18.55 18.76 














a IX has a superiority over a III of about 4.2 in the uncorrected results, 
and of about 3.5 or 3.1 by the corrected results (according as .05 or 
.085 is the basis of correction). These superiorities are computed as 
follows: 

Groups III and IX are much alike in program except that .97 of a 
IX replaces 1.12 of a III and that the IX group took less work in 
general. The observed superiority of group IX over group III was 
5.24. By the .05 and .085 corrections it was, respectively, 4.16 and 
3.49. 

Groups III V and V IX are much alike in program except that 
1.23 of a IX replaces 1.04 of a III and that the V IX group took more 
work in general. The observed superiority of group V IX over group 
III V was 3.48. By the .05 and .085 corrections it was, respectively, 
2.89 and 2.47. 

Groups III VI and VI IX are much the same in program except 
that 1.03 of a IX replaces 1.06 of a III and that the VI IX group took 
less work in general, especially in history. The observed superiority of 
group VI IX over group III VI was 6.35. By the .05 and .085 correc- 
tions it was, respectively, 5.67 and 5.21. 
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Groups III V VI and V VI IX are much alike in program except 
that 1.02 of a IX replaces 1.04 of a III and that the V VI IX group 
took more work in general. The observed superiority of group V 
VI IX, over group III V VI was 2.69. By the .05 and .085 correc- 
tions it was, respectively, 2.25 and 1.94. 

We have then roughly as determinations of superiority in gain: 











Corrected 
Raw 
By .05 By .085 

For .97 IX compared with 1.12 III......... 5.24 4.16 3.49 
For 1.23 1X compared with 1.04 III......... 3.48 2.89 2.47 
For 1.03 IX compared with 1.06 III......... 6.35 5.67 5.21 
For 1.02 IX compared with 1.04 III......... | 2.69 2.25 1.94 
Summing: 

For 4.25 IX compared with 4.26 III........ .| 17.76 14.97 13.11 
For 1.00 IX compared with 1.00 III........ ; 4.18 3.52 3.08 














In neither these nor the earlier determinations by the first method 
where a course in III or IX was balanced against a course in I or 
nothing, was ‘‘a course”’ rigidly defined. It may have occasionally 
been a half-year instead of a full year, and it may (though very rarely) 
have been more than one full-year course. It is probable that the 
amount represented by an entry of III or a IX in general would be 
on the average somewhat less than the amount represented by, 1.00 
III or 1.00 IX in the selected special programs used in applying 
the second method, since there was a greater chance for a half-year 
to be counted as a year in the use of the first method. Thediscrepancy 
between the 3.52 and 2.86 derived from Table XIX and the 4.18 
and 3.08 above, would probably be somewhat reduced if there had 
been more complete uniformity in the counting of ‘‘a course.” 

The approximate reliabilities of the 4.18, 3.52 and 3.08 can be 
computed from our data, since the four pairs of groups contain no 
individual twice. The errors of estimate of the 4.18, 3.52 and 3.08 
will be somewhat less than those obtained for the average of 5.24, 3.48, 
6.35 and 2.69, for the average of 4.16, 2.89, 5.67 and 2.25, and for the 
average of 3.49, 2.47, 5.21 and 1.94. These are, respectively, in terms 
of the mean square error, .7, .614 and .6. 
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As our next comparison we take that of mathematics vs. Latin or 
French. By the earlier method we have in Table XIX an average 
superiority of 1.89 in obtained gains for a V over a VI, which becomes 
2.20 after correction. Using the facts of Tables XXII and XXIII, 
obtained by the second method, a V has a superiority over a VI of 
about 1.9 in the uncorrected results and of about 2.8 and 3.4 by the 
corrected results, according as .05 or .085 is the basis of correction. 

These superiorities are computed as follows: 

Groups V and VI are much alike except that 1.19 of a VI replaces 
1.09 of a V, and that the VI group has more English and history and 
less Spanish. The superiorities of V over VI (observed, corrected 
by .05, and corrected by .085) are, respectively .65, 2.19 and 3.26. 
Groups III V and III VI are much alike except that 1.08 of VI replace 
1.03 of V. The respective superiorities of III V over III VI are 4.41, 
4.78 and 5.25. 

Groups V IX and VI IX are unlike in the replacement of 1.08 of 
V by 1.15 of VI and in the fact that the pupils of the VI IX groups take 
very much less work, especially in drawing, Spanish, civics, IX, and 
physical training. The respective superiorities of V IX over VI IX 
are 1.27, 2.00 and 2.51. 

These three groups are not so well equalized in other respects than 
V vs. VI as would be desirable. Taking them as they are, we have as 
the determinations of the superiority of V to VI in gain; 











Corrected 
Raw 
By .05 By .085 

For 1.09 V compared with 1.19 VI.......... .65 2.19 3.26 
For 1.03 V compared with 1.08 VI.......... 4.14 4.78 5.25 
For 1.08 V compared with 1.15 VI.......... 1.27 2.00 2.51 
Summing: 

For 3.20 V compared with 3.42 VI.......... 6.06 8.97 11.02 
For 1.00 V compared with 1.07 VI.......... 1.89 2.80 3.44 














The procedure followed in this and later applications of this second 
method of estimating the influence of taking one study rather than 
another consists in comparing two groups, much alike in other respects, 
one of which has a certain amount of study A, the other of which has a 
certain amount of study B, with, in some cases, an allowance for such 
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differences in other respects as do occur. These comparisons are pre- 
sented in equation form though they are perhaps more like chemical 
equations than numerical equations. 

The superiorities of 1.00 V over 1.00 VI would be only a little, if 
any, greater than the above, because the .07 of VI is not much superior 
in influence to the .07 of drawing, Spanish, civics, IX, etc., which the 
V groups had in place of it. The mean square errors of estimate of 
the 1.89, 2.80 and 3.44 will be somewhat less than .8%, .714 and .7, 
which are the errors for the averages of the .65, 4.14 and 1.27, ete., 

In the case of V vs. IX, the earlier treatment gave a superiority in 
gain of .20 to IX by the obtained results and a superiority of .28 to 
V by the corrected results. By the second method V has a superiority 
over IX of about .2 in the uncorrected results and of about 1.3 and 2.1 
by the use of .05 and .085 respectively as the correction factor; but 
these results have large probable errors. 

These superiorities are computed by comparing the facts for group 
V and group IX, and for group V VI and group VIIX. The groups 
taking mathematics take more work in general than the groups taking 
physical science, 4.97 to 4.32 courses and 5.32 to 4.67 courses, but half 
of this excess is in physical training. The superiorities of V over IX 
are: 














| | Corrected 
Raw FE. RS 
| | By .05 | By .085 

Using V and IX; | | 

For 1.09 V compared with .97 IX........... —-1.0 | — . | .99 
Using V VI and VI IX; | 

For 1.07 V compared with 1.23 IX.......... 2.06 | 2.95 3.56 
Summing: 

For 2.16 V compared with 2.20 IX.......... .46 2.87 4.55 
For 1.00 V compared with 1.02 IX.......... | 21 1.33 2.10 








The mean square errors of estimate of the .21, 1.33 and 2.10 cannot 
be determined with precision;.1.6, 1.4 and 1.2 are reasonable estimates 
obtained by using the —1.60 and 2.06, etc. and estimating the mean 
Square deviation as 1.25331 times the obtained average deviation. 

Treating the cases of III and V, III and VI, and VI and IX in the 
way shown above for III and IX, V and VI, and V and IX, we have 
the following facts: 
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SuPERIORITIES OF V To III 



































Corrected 
Raw 
By .05 By .085 
For 1.09 V compared with 1.12 III.......... aes a no 
For 1.07 V compared with 1.06 III.......... ; 
Summing: 
For 2.16 V compared with 2.18 III.......... pi gh Bee 
For 1.00 V compared with 1.01 III.......... ; 
By the first method we had 3.72 raw and 3.14 corrected. 
Superioritizes or VI ro III 
Corrected 
Raw 
By .05 By .085 
1.19 VI compared with 1.12 III............. 2.99 1.89 1.12 
1.10 VI compared with 1.04 III............. 4.27 3.84 3.52 
Summing: | 
2.29 VI compared with 2.16 III.............| 7.26 5.73 4.64 
1.00 VI compared with .94III............. |. oa 2.50 2.03 











By the first method we had 1.82 raw and .94 corrected for the superiority of a VI 


over a III. 


SUPERIORITIES OF IX To VI 











Corrected 
Raw 

By .05 By .085 

.97 IX compared with 1.19 VI............. 2.25 2.27 2.27 

1.23 IX compared with 1.10 VI............. — .79 — .95 —1.05 
Summing: 

2.20 IX compared with 2.29 VI............. 1.46 1.32 1.22 

1.00 LX compared with 1.04 VI............. .66 .60 55 














By the first method we had 1.69 raw and 1.92 corrected. 


By the second method V has greater superiority than it had by the 
first, gaining 1.9, 0, and .4 over III, VI and IX respectively in the raw 
or obtained results. III has greater inferiority than it had by the 
first method, losing 1.9, 1.4 and .7 to V, VI and IX respectively in the 
raw results. VI gains 1.4, 0 and 1.0 over III, V and IX respectively. 
IX gains .7 over III and loses .4 to V and 1.0 to VI. 
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By the first method the influence of a III, a V, a VI, and a IX, all 
together, was, by the obtained results, —.20 + 3.52 + 1.63 + 3.32 
(or 8.27). Assuming that this total is valid, the influences by method 
II could be assigned as follows: III, —1.2; V, +4.2; VI, +2.3; IX, 
+3.0.! 

This makes a close fit to the observed differences, as shown below: 


DIFFERENCES OBSERVED BY Mertuop II anp DirreRENCES CALCULATED FROM III, 
—1.2; V, +4.2; VI, +2.3; IX, +3.0 








Observed | Calculated A 
en 5.6 5.4 — 2 
IN EE i veces vcedccecesésces 3.2 3.5 + 3 
IE icc ccccccbccanectecs 4.2 4.2 0 
I EEE oc ci vcccsupaweseee vc 1.9 1.9 .0 
I on cc ncee ceeeeeneeues .2 1.2 +1.0 
p>: m m 0 














The results by the two methods and their averages are then: 


SUPERIORITIES IN GAIN 








By method | By method Average 
I II 
Nie 6 ie Rai en ae ail ke ee — 2 —1.2 —- .7 
Mitesh l lean ete) tm aent bd wen are iad ase 3.5 4.2 3.9 
Medi d Rik eREe Cee Kenna Gamecsasee 4 1.6 2.3 2.0 
2 RA ee BE ey Es yy tok eee 3.3 3.0 3.2 














By any reasonable allowance for the greater initial ability of the 
pupils who take V or VI or IX in place of I or nothing, III will make a 
little better showing, VI and IX will be reduced about equally. 

By making a similar treatment, but using the facts as estimated 
for pupils all of equal initial ability by the .085 correction, we find the 
following: V has a much greater superiority than it had by the first 
method, gaining 3.0, 1.2 and 1.8 over III, VI, and IX, respectively. 
III has greater inferiority than it had by the first method, losing 3.0, 
1.1 and .2 to V, VI, and IX, respectively. VI gains 1.1 on III, loses 
1.2 to V and gains 1.344 on IX. IX gains .2 on III, loses 1.8 to V and 
loses 1.314 to VI. 


1 There are, of course, many other reasonable assignments, none of which will 
diverge greatly from this. 
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By the first method the influence of a III, a V, a VI, and a IX, all 
together, was —.15 + 2.99 + .79 + 2.71 or 6.34. Assuming that the 
total is valid, the influences by method II could be assigned as follows: 
III, —1.0; V, +4.6; VI, +1.2; IX, +1.6. This gives a fairly close 
fit of calculated to observed differences, as follows: 


























Observed | Calculated 
NS MO on ss ea need une t 6.1 5.6 — .§ 
Between VI and III........... sare reiws ~ ae 2.2 + .2 
OO 8” 2 rere ee 3.1 2.6 — .§ 
i LU ikaw es baw kes 3.4 3.4 + .0 
SE GE Bac sks ce cewaneeacedens 2.1 3.0 + .9 
NN INE WE oie wks Said o's we eewaes 5% 4 — 1% 
The results by the two methods and their averages are then: 
SUPERIORITIES IN GAIN 
By method | By method Avienee 
I II 

Bibs 0d daid incl Badan wskGaue —- .1%| 1.0 — 6 
, tah ee reree seer errr oper ers +3.0 +4.6 3.8 
nthe teh tntiadehhs abe dies de eeeee en + .8 +1.2 1.0 
Ce ke ie eh edad an bd taceeeeen +2.7 +1.6 2.2 














We miay now consider the facts obtained by method II concerning 
the superiority of programs rich in mathematics, Latin and the physical 
sciences to programs in which their place is taken by cooking, sewing, 
stenography, typewriting, and manual arts. We shall, that is, con- 
sider C, M, and C M groups in comparison with the V VI group, the 
V IX group, the VI IX group and the V VI IX group. 

The C group and the V VI group are much the same in program 
except that 2.20 of stenography, typewriting, bookkeeping and other 
commercial subjects replaces 1.07 of mathematics and 1.10 of Latin and 
French, and that there are .26 of a course more of Spanish, .36 of a 
course less of physical training, and a little less of English, history, 
drawing and music. According to method I we should expect a raw 
superiority for the V VI group of about 5.7. This is computed by 
(1.07 X 3.52) + (1.10 X 1.63) + (1.39  .80) + (.386 X .09) — 
(.43 X 2.29). By method II we have a raw superiority of 9.91. Asa 
corrected superiority we should expect about 3.4. This is computed 
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by (1.07 X 2.99) + (1.10 X .79) + (1.39 XK .14) + (.36 x .83) — 
(.43 X 2.60). By method II we have, after correction by .085, a 
superiority of 8.46. 

The C group and the V IX group are much the same in program 
except that the 2.20 of commercial subjects replaces 1.08 of mathe- 
matics and 1.23 of physics, etc., and that the C group has a much lighter 
program, with less of everything else except music and dramatic art 
than the V IX group. According to method I we should expect a 
raw superiority for the V IX group of about 8.5. This is computed 
by (1.08 X 3.52) + (1.23 X 3.32) + (1.39 X .80) — (.43 X 2.29) + 
(.14 XK 2.42) + (.04 X 2.29) + (.25 X .09). By method II we have 
a raw superiority of 9.12. Asa corrected superiority, we should expect 
from the estimates of method I about 6.4. This is computed by (1.08 
X 2.99) + (1.23 X 2.71) + (1.389 X .14) — (.48 XK 2.60) + (.14 x 
2.89) + (.04 X 2.60) + (.25 X .83). By method II we have a 
superiority of 7.41, after the .085 correction has been applied. 

Groups VI IX and C are much alike except that the 2.20 of 
commercial subjects replaces 1.15 of Latin and French and 1.03 of 
physics, etc., and that the C group has more Spanish and civics. By 
method I we should expect a raw superiority for the VI IX group of 
about 5.2. This is computed by (1.15 X 1.63) + (1.03 X 3.32) + 
(1.39 X .80) — (.09 K 2.42) + (0.7 X —.20) — (.48 X 2.29). By 
method II we have 7.85. 

By method I we should expect a corrected superiority of about 2.5. 
This is computed by (1.15 X .79) + (1.03 X 2.71) + (1.39 X .14) — 
(.09 XK 2.89) + (.07 X —.15) — (.48 X .260). By method II, using 
085, we have 4.90. 

We may check the results by method I further by using the superior- 
ity of III V, and of III VI, to C. 

Group III V and group C are much alike except that 2.20 commer- 
cial subjects is replaced by 1.04 III and 1.03 V, and that C has .18 
more of civics and .21 less of physical training. The raw superiority 
expected by method I is then estimated as about 3.1 from (1.04 X 
—.20) + (1.03 X 3.52) + (1.39 X .80) — (.43 X 2.29) — (.18 X 
2.42) + (21 X .09). By method II it is 5.64. The corrected supe- 
riority by method I is estimated as about 1.7 from (1.04 K —.15) + 
(1.03 X 2.99) + (1.39 x .14) — (.48 X 2.60) — (.18 XK 2.89) +(.21 
X .83). By method II it is found to be 4.94. 

Group III VI and group C are very much alike in program except 
that 1.06 III and 1.08 VI replace the 2.20 of commercial subjects and 
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that C has more Spanish and less history. The raw superiority of 
III VI is estimated by method I as about 1.7 from (1.06 K —.20) + 
(1.08 XK 1.63) — (1.39 K —.80) — (.48 X 2.29). By method II 
it is found to be 1.50. The corrected superiority is estimated by 
method I as about —.2 from (1.06  .15) + (1.08 X .79) — (1.39 x 
.14) — (.48 X 2.60). By method II, using .085, it is found to be 
—.3. 

On the whole, method II gives about 2.0 greater superiority to 
the intellectualistic studies over stenography and typewriting, than 
method I, as shown below: 











Raw Corrected 
Method 
— “—_ 4 | Method | II (using! 

.085) 
ke See 5.7 9.9 4.2 3.4 8.5 5.1 
ree ' 8.5 9.1 6 6.4 7.4 1.0 
i ee 5.2 7.9 2.7 2.5 4.9 2.4 
3 ee 3.1 5.6 2.5 1.7 4.9 3.2 
hs ee | 1.5 — 2 — .2 -— 3 — .l 
RE an aoe eee nd ge eee 2.3 


























The most extreme case of difference in programs in respect of the 
amount of intellectualistic subjects is that between the V VI IX group 
of 455 pupils and the commercial and manual group of 548 pupils; 3.31 
of mathematics, Latin, French, physics, etc., replace 1.98 of stenog- 
raphy, typing, cooking and sewing, .95 other commercial and shop, .17 
Spanish and .19 civics. The V VI IX group has 5.87 in all to 5.47, 
the .40 excess being mostly in English, drawing and physical training. 
By method I the estimated raw superiority is about 9.7, from (1.23 x 
3.52) + (1.06 X 1.63) + (1.02 X 3.32) + (1.98 X .80) — (.19 X 
2.42) + (.13 K .09) — (.38 XK 2.29) — (.06 X .30). The raw supe- 
riority found by method II is 9.92. By method I the estimated cor- 
rected superiority is about 6.1, from (1.23 & 2.99) + (1.06 X .79) + 
(1.02 & 2.71) + (1.98 X .14) — (.19 XK 2.89) + (.13 X .83) — (.38 X 
2.60) — (.06 X .30). The corrected superiority found by method II, 
using .085, is 7.27. Using these groups, method II gives a somewhat 
greater superiority (.2 for the raw, and 1.2 for the corrected) to the 
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intellectualistic studies over stenography, typewriting, cooking and 
sewing, than method I. 

In these computations with C and with C M, however, bookkeeping 
has been credited with the value of study IV, which is arithmetic and 
bookkeeping. It is probable that the value of arithmetic is higher than 
that of bookkeeping, and that consequently our estimates of the 
superiorities of V VI, etc. to C by method I are all too low. 

We may estimate the influence of 1.00 VIII by method II by 
comparing C with the average of III, V, VI and IX and by comparing 
C M with the average of V VI and V IX and VI IX, as follows: 

The average of III, V, VI and IX, with equal weight attached to 
each group, gives the results shown in Table XXIV. C differs from 
this average in having 1.39 VIII in place of 1.14 of III or V or VI or IX, 
in having .04 less civics, .06 less dramatic art, .13 less physical training, 
42 more bookkeeping and arithmetic and .14 less of neutral subjects. 
The raw value of the 1.14 of III, V, VI and IX by method IT is 1.14 X 


TaBLeE XXIV.—Tue AveRAGE NuMBER OF Courses PER Pupit in C, C + M, 
AND Two CoMPOSITES 








Composite Composite 

of II,v, | C res — C+M 

Vi and IX 

VI Ix 

Cas od cab é04bhb de eens 1.00 .93 1.03 .84 
CE Sue écutca hues arden wane e .76 .59 .57 .56 
| PCS ee ee ore 34 .14 .30 17 
Cit wider abuih oe oeckw ee -20 .16 16 .10 
EE ee ee ee ee .38 .34 18 .23 
NE Fe 02% .02 01 .03 
a dead ankle heen o' 354% | .3i1 .33 | .34 
I os cs b cauekw vives tek .32 eae .04 
PL a bclecae ys ones bhGae oe .02 .O1 .02 
We di ckhaneR xs Kade cae 27 ee .72 
WE eniigitnsnt aadee ean .30 or .75 
ile Ec nccchouin ta de ones .25 ae .75 
er .09 .03 .05 .03 
T. Physical training.............. .334 .20 41 .24 
Oe a waqse. vise ch ase “sie 1.39 “+ 1.98 
te od eeeieeh! “sapien .43 vin .38 
0 AT Te 38 | whee 26 
eee eect seaws 
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2.1 (or 2.39). The raw average superiority is 23.46 — 17.82 or 5.64. 
So 2.39 — 5.64 = 1.39 VIII — (.04 x 2.42) — (.06 X -.39) — (.13 x 
.09) + (.42 X 2.29). From this, 1.39 VIII = —4.13 and 1.00 VIII 
= —2.97. A similar treatment, but using the corrected values, gives 
1.00 VIII as — 2.43. 

The average of V VI and V IX, with equal weight attached to each 
group, gives the results shown in Table XXIV. C M differs from this 
average in having 1.98 of VIII in place of 2.22 of V or VI or IX, in 
having .01 more of civics, .04 less of biology, .02 less of dramatic art, 
.17; less of physical training, .36 more of arithmetic and bookkeeping, 
.06 more of manual training, and .16 more of neutral subjects. The 
raw,,value of 2.22 of V or VI or IX by method II is 7.1. The raw 
average superiority is 26.78 — 18.26, or 8.52. So 7.1 — 8.52 = 1.98 
VIII + (.01 X 2.42) — (.04 K —.20) — (.02 K —.39) — (.17 X .09) + 
(.36 X 2.29) + (.06 X .30). From this, 1.99 VIII = — 2.29 and 1.00 
VIII = —1.16. A similar treatment, but using corrected values, ! gives 
1.00 VIII as — 1.03. 

As the average of the two determinations we have —2.07 as raw and 
— 1.73 as corrected values for VIII.? 

For all the groups of studies which have been treated by the two 
methods, we have, then, the following: 











: ; Estimated results for pupils 
ee whe equalized as to initial ability 
J in the tests 

Method | Method Pieupiiens Method | Method A 
t: II . I Il — 
sc dantsiisans — 2 —1.2 - 7 |}— 1% 1.0 — .6 
,_ Et TES ES og 3.5 4.2 3.9 3.0 4.6 3.8 
ct dmed ihe whan 1.6 2.3 2.0 8 1.2 1.0 
Ry — 8 —2.1 -1.5 | -—- 1% —1.7 — 9 
AS ey 3.3 3.0 3.2 2.7 1.6 2.2 























The results by method II in general differ from the results by 
method I in attaching less value to the study of the physical sciences, 
less to the study of the biological sciences, a little more value to the 
study of Latin and French, considerably more to the study of algebra, 





1 Calling manual training +.50. 
2 These are probably too low because of the fact noted above about bookkeeping; 
— 1.8 and —1.5 will probably in the end be found to be nearer the truth. 
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geometry and typewriting, and considerably less to the study of 
stenography and typewriting, cooking and sewing. 

Besides serving as a check upon method I, method II provides 
important direct measurements comparing C, M and C M, which 
have no mathematics, science or foreign language (except a little 
Spanish) with V VI and V IX and VI IX and V VI IX, which are 
very rich in these and have no commercial or manual studies. The 
facts are: 











‘ | Average Average 
initial score gain 
Wash dacedsthateeris crceeebouter oie 1210 196.7 27.7 
Pirate SEN. TOU CAA 643 199.7 26.9 
Wad, ive ch deo -cclide Shh R ok i cle 283 214.3 25.7 
eed, 5 «3th t alve edie dither kuweir beans <a 455 205.3 28.2 
EE ere er eee 826 179.7 17.8 
ad dtd he he ba eed on oo 192 166.5 18.6 
Commercial or manual.................. 548 174.1 18.3 














There is a difference of about 10 between relatively dull pupils 
taking the least intellectualistic programs which high-schools offer 
and relatively bright pupils taking three-fifths of their work in mathe- 
matics, Latin (or more rarely French), and physical science. This 
difference would be reduced to about 7.0 if the pupils concerned were 
equalized in respect of intellect, and would probably be reduced further 
if the pupils were equalized in all respects. We may reasonably set 
8 as an upper limit and 4 as a lower limit to the intrinsic effect of three 
courses in mathematics, Latin and physical sciences vs. that of three 
courses in stenography, typewriting, cooking and sewing. Part of 
this effect is specific in the sense that it is confined to tests with 
verbal, mathematical and spatial data, and would not appear if the 
tests were with situations of, say, business, politics, or family life. 

Certain features of the records make one suspect that selective 
forces are at work picking pupils who will gain in intellect to take 
certain studies beyond those forces measured by initial ability and more 
or less well allowed for by the corrections made in 1922-1923 for sex 
and initial ability, and in 1925-1926 for initial ability. For example, 
the corrected value of a course in V plus a course in VI plus a course in 
IX is 6.5 by method I and 7.4 by method II. But when we compare 
pupils who take two of these courses with pupils who take only one, 
the value of three such added courses (1.01 of V, 1.06 of VI, and 1.29 
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of IX) is only 5.7. And when we compare pupils who take all three of 
these courses with pupils who take only one, the value of the three 
(1.0114 V + .9944 VI + 1.02 IX) is only 4.7. It is, of course, con- 
ceivable and possible that the training value of these courses might 
follow a law of diminishing returns, whereby taking two courses had 
less than twice the effect of taking one; but it seems still more probable 
that their selective power would do so. 

The derivation of these quantities is as follows: The sum of the V, 
VI and IX average gains, corrected by 0.85, is 68.65 (from Table XX- 
III). The sum of the corresponding V VI and V, IX and VI IX aver- 
age gains is 74.32. The second set of programs differ from the first in 
that 1.06 of V, 1.06 of VI and 1.29 of IX and .21 physical training 
replace 1.26 of English, history, drawing, music, Spanish, 0.1 of mis- 
cellaneous, .13 of II, 0.4 of III, and 11 dramatic art. The .21 physical 
training and .05 of V balance the combined effect of the .13 II, 0.4 
III and .11 D. So 1.01 of V, 1.06 of VI and 1.29 of IX produce, in 
replacing I or nothing, 74.32 — 68.65 + .0031 (or 5.7). 

The average gain for V, VI and IX multiplied by three is 78.09. 
Three times the V VI IX program differs from the sum of the V and 
VI and IX programs in that 2.60 V + 1.99 VI + 2.09 IX and .09 
physical training replace 2.10 of English, history, drawing, music 
and Spanish, .66 of II, .06 III and .24 dramatic art. .57 of the V and 
the .09 physical training balance the total effect of the .66 II, .06 III 
and .24 dramatic art. We obtain the effect of the .57 of V and the .09 
of physical training by (.57 X .299) + (.09 X .83) = 1.78. We 
obtain the effect of the .66 II, .06 III and .24 dramatic art by (.66 X 
2.89) + (.06 X —.15) + (.24 K —.48) = 1.79. So 2.03 V + 1.99 
VI + 2.04 IX, in replacing I or nothing, here produce 78.09 — 68.75 
or 9.33. 1.01144 V + .9914 VI + 1.02 IX produce 4.7. 

We do not know what these suspected selective forces are; but we 
can make more or less reasonable hypotheses. For example, the pupils 
who take such intellectualistic subjects as Latin, mathematics and the 
physical sciences, as compared with the pupils of equal ability in the 
intelligence test who take commercial and manual subjects, are prob- 
ably more ambitious for intellectual advancement, fonder of intellec- 
tual pursuits, and from more intellectual homes. Their lives outside 
of school are consequently probably more occupied with selective and 
relational thinking, generalization and organization with words, num- 
bers and spatial facts and relations than the extra-school lives of stu- 
dents who take typewriting, sewing and the like in school. 
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If we had the V VI IX group study C and M programs and had 
the commercial and manual groups study Latin and mathematics and 
physics for a year, after first picking representatives exactly paired 
for intelligence of the sort measured by the test, it might well be that 
the superiority of the V VI or V VI IX group would be less than that 
estimated by method I or method II. 

It is of importance to know how far the superiority shown by 
certain studies is general for all intellectual tasks, and how far it is 
specialized or limited to tasks with words, numbers and space relations 
such as made up the tests used in our inquiry. We cannot, of course, 
determine the effect of one rather than another course of study upon, 
say, ability to solve business problems or to meet social emergencies 
without special tests ad hoc. But we can estimate the extent to which 
the superiority of one course over another is limited to the sort of tasks 
used in our tests by measuring the extent to which it is limited to one 
of the three sorts used therein; namely, verbal, numerical, and spatial. 

We have made a beginning with such measurements by measuring 
the gain in each of these three sorts of tasks for the groups designated 
III, V, VI, manual, commercial, III V, and III VI in the preceding 
discussion. The facts are presented in Table XXV. As elsewhere, 
the very great individual variations in gain prevent any clear result 
until we have large populations; and even when the results are com- 
bined for our entire populations, the results, though clear, are not very 
reliable as to their precise amounts. 

Consider first V, VI and M. These three program-groups are 
much alike in the gain in the spatial tests, the inferiority of the manual 
groups being restricted largely to the tests with words and numbers. 
V and VI are about alike in total gain, but the pupils taking V make 
much more of their gain in numerical tests than in verbal, and vice 
versa for those taking VI. 

Allowing equal weight to the 1, 1922-1923, the 2, 1922-1923, the 
1, 1925-1926 and the 2, 1925-1926 measurements we find a superiority 
of V over VI in gain in numerical tasks amounting to 1.90 points, and 
in space-relations tasks amounting to .23 points, whereas there is an 
inferiority of V to VI in verbal tasks amounting to 2.72. The supe- 
riority of V over manual is 1.20 in tasks with words, 2.47 in tasks with 
numbers and .30 in tasks with space-relations.. The superiority of VI 
over manual is constituted very differently, being 3.92 in verbal, .57 in 
numerical and only .07 in spatial tasks. The gain due to V is thus a 
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very different thing from the gain due to VI. The former is chiefly in 
numerical and spatial tasks; the latter is chiefly in verbal. 

In general, the gains are specialized to a notable extent; words, 
numbers and spatial tests do not show similar gains regardless of the 
subject. Two-thirds of the superiority of the manual to the com- 
mercial group is in space tests. None of the superiority of III to the 


TaBLeE XXV.—TxHe Gains Mave By CERTAIN GROUPS IN THE VERBAL, Novum- 
ERICAL, AND SpaTiaL Tasks or THE I. E. R. Tests or SELECTIVE AND 
RELATIONAL THINKING, GENERALIZATION AND ORGANIZATION 























Gains Superiorities 
Group Date n | Wo. | Nu.| Sp. Group Wo. |Nu.| Sp. 
| 
Veeid itd es iwi °22,’23 (264) 8.44/6.24/6.24) V over VI................. |-2.72 1.90 33 
"25, °26 |377|10.84|7.98/6.70] III V over III VI.......... — .19)3.03) 1.28 
Wa wescsrsurs °22, °23 +|139/11.85|5.46/6.19! V over III................. .09'2.33) 1.75 
'25, '26 | 83\12.87/4.96/6.29| V over manual............. 1.20|2.47 .30 
RRP ’22, °23 |112|/10.70/4.60'6.07| V over commercial......... 1,24;3.17| 1.67 
°25, 26 |139/10.14'5.90/6.11) VI over III................ 2.81) .43) 1.52 
gg | EE 22, '°23 (111)12.53'3.05|4.75| VI over manual............ 3.92) .57 .07 
25, °26 | 88) 8.69)1.38\4.88) VI over commercial......... 3.96)1.27) 1.44 
evssébdaws "22, '23 | 72\12.30)4.34/4.62) III over manual............ 1.11) .14;—1.45 
°25, '26 (104) 6.79)5.21/4.82) III over commercial........ 1.15, .84;— .08 
Manual....... ’22,'°23 | 70) 7.29'7.69\5.90| Manual over commercial... . .04) .70| 1.37 
°25, '26 |122) 9.59/1.58/6.44 
Commercial...| ’°22,’23 (490) 8.33\5.11/5.13 
'25, 26 (335) 8.46|2.77/4.47 
Weaevesdeaans Average |...| 9.64)'7.11|6.47 
Weliiwibwe as Average .|12.36\5.21\6.24 
 ) a Average |...|10.42)5.25|6.09 
See Average |...|10.61'2.22/4.82 
ST ae Average |...; 9.55.4.78,4.72 
Manual....... Average |...| 8.44\4.64|6.17 
Commercial...| Average |...| 8.40:3.94'4.80 
| 























commercial group is in space tests. The average superiority of V to 
III, manual and commercial is .84 in words, 2.66 in numbers and 1.24 
in space tests. The average superiority of VI to III, manual and 
commercial is 3.56 in words, .76 in numbers and 1.01 in space tests. 
If we had used only the word tests we should have been led to believe 
that the III group gained 99 per cent as much as the V group, and that 
the VI group gained about 28 per cent more than the V group. If we 
had used only the space tests, we should have found V, VI, and manual 
nearly indistinguishable in gain. 
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The superior gains which we have found associated with taking 
certain studies are then surely not to be regarded as consisting entirely 
in a general improvement for thinking with all sorts of data. 


Note A. ON THE DIFFERENCE BETWEEN LATIN AND FRENCH IN 
EFFECT UPON GAIN IN THE SCORE IN THE INTELLIGENCE 
EXAMINATION 


Latin and French have been grouped together because the pre- 
liminary inquiry of the earlier study indicated that they were not 
much different in their effects upon gain in intelligence score, and 
because it was desirable to have a few comparisons with large groups 
rather than many comparisons with small groups. With the added 
cases of the 1925-1926 experiments available, it is possible to measure 
the difference between Latin and French in respect of gain in score; 
and we have done this. 

We split the VI group used in method II into three groups, L, F 
and LF, meaning, of course, pupils who took Latin, pupils who took 
French, and pupils who took both Latin and French. We split the 
V VI group into VL, VF and VLF. We split the III VI group into 
III L, II F, and III LF. We split the VI IX group into IX L, IX F, 
and IX LF. We then compare L, VL, III L and IX L with F, VF, 
III F and IX F as to the balance of the program, the initial score, and 
the gain. 

The facts as to program are shown in Table XXVI. The L’s 
have per person .09 of a course more II, .29 of a course more V, .18 
of a course more physical training, .13 of a course less III, and .10 
of a course less IX. The difference in English, history, drawing, 
music and Spanish together is very slight, amounting to .08 of a course 
more for the F’s. The L’s are thus at an advantage in the accessory 
subjects. By these along the L’s should average 1.024 greater gain 
than the F’s (.09 & 2.89) + (.29 X 2.99) + (.18 X .82) —(1.3 K —.15) 
— (.10 X 2.71). 

The facts as to initial ability and gain are shown in Table XXVII. 
If we weight each determination of L’s superiority by the smaller of the 
two populations involved, the weighted average raw superiority is 
.33; the weighted average superiority after correction for the greater 
initial ability of the L’s is .24. The observed difference in gain is 
thus less than should have been produced by the difference in the 
accompanying subjects. So there seems to be no essential superiority 
of Latin to French in respect of influence upon gain in the tests used. 
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LAWS OF LEARNING 
PERCIVAL M. SYMONDS 
Teachers College, Columbia University 


This paper presents a list of 23 descriptions of learning—generaliza- 
tions which may be called laws of learning. They are all in the 
language of the conditioned reaction. The material for these laws was 
obtained from Chapter, XIV ‘“Conditidned Reflexes” in Recens 
Advances in Physiology b} C. Lovatt Evans. Evans states in this 
chapter, ‘‘ Psychological methods based on introspection, together with 
a statistical study of animal behavior, represented almost the only lines 
of attack on the question of function, and these methods, owing partly 
to their lack of connection with the already established facts regarding 
the physiology of the lower parts, and partly to the difficulty of proper 
control, led rather to the construction of elaborate technical vocabu- 
laries than to physiological explanations of function.”” But on reading 
further in the chapter, I discovered that Evans had in reality given a 
remarkably clear and concise description of the laws of formation of 
conditioned reflexes which reduce themselves to a statement of the 
laws of learning. I have transcribed his descriptions into categorical 
“laws” and have offered brief illustrations of their interpretation in 
educational problefws. Evans has drawn from a variety of sources 
but most particularly from Pavlov and his students. Pavlov’s work 
is only available in Russian in a book entitled Twenty Years’ Objective 
Study of the Higher Centres of Nervous Activity (Behavior) of Animals, 
so that Professor Evans has done a distinct service in making available 
a@ summary of his findings. My own endeavor in the present restate- 
ment is to make available for educational psychologists the work of the 
physiologists. Many of the laws which follow have long been stated 
in our own texts on learning, but some are new and deserve experi- 
mental verification. _ 

1. The fundamental law of all learning is the law of the conditioned 
reaction: Stimulation of any receptor organ occurring simultaneously with 
a reaction of an effector organ leads to the formation of a new stimulus- 
response bond. The classic example of the conditioned reaction is 
Pavlov’s dog. It is well known that the sight or smell of food is a 
sufficient stimulus to excite the salivary glands in the mouth. Pavlov 
arranged the conditions so that he could measure the amount of saliva 
secreted. Then he proceeded to stimulate the glands by presenting 
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food, ai the same time ringing a bell. Eventually the ringing of the 
bell alone would cause the saliva to flow. It has been demonstrated 
by Watson, Mrs. Jones and others that fear reactions in infants may 
be conditioned in similar manner. Suppose that a baby is frightened 
by loud noises. Then present to the baby a dog and let the dog bark. 
The child becomes frightened at the noise (barking). But the dog is 
also a visual stimulus occurring simultaneously with the auditory 
stimulus which caused the fear reaction. On another occasion the 
sight of the dog alone is sufficient to arouse fear responses. This is a 
learned reaction. Or again a small boy goes to the zoo. There he 
sees an animal and his mother pronounces the name of the animal 
“‘lion”’ which the boy repeats. On another occasion merely seeing the 
lion is sufficient to cause the boy to say lion. (In this last example 
there also has to be present in the first instance some stimulus to make 
the boy say something. ) 

2. Repetition is necessary before the conditioned reaction is formed. 
This is the law of use so well expressed by Thorndike. In the case of 
Pavlov’s dog the bell had to be run simultaneously with the presenta- 
tion of the food many times before the bell would be a sufficient stimu- 
lus to arouse the reaction of the salivary glands alone. Seemingly in 
the case of human reactions learning takes place at the first occurrence 
as illustrated by learning the name lion. But experiments in learning 
words in foreign languages show that repetition is necessary. The law 
of use seems to be a fundamental law of learning. 

3. If the unconditioned stimulus precedes the conditioned stimulus, 
there is no new reflex formed. If the food is presented and then the 
bell is sounded no reflex is formed. 

4. If the conditioned stimulus precedes the unconditioned stimulus, a 
new reflex is formed. If the bell is sounded and then the food is pre- 
sented at a later instant, the reflex ‘bell sounding-saliva flows’’ is 
formed. If the conditioned stimulus is continued throughout the 
interval, the resulting reflex is known as a delayed conditioned reflez. 
If the conditioned stimulus stops before the unconditioned reflex is 
given, the resulting learned reflex is known as a trace reflex. 

These two laws have an important bearitg on educational pro- 
cedure which may be summarized as follows. In learning an associa- 
tion, present the unfamiliar element first. For instance in learning 
words in a new language, present the word in the foreign language 
before presenting it in the vernacular. Or if one is teaching the mean- 
ing of words in a foreign language by means of pictures, objects, or 
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acts, present the new and unfamiliar word simultaneously with or 
before the picture, object, or act. 

One may have difficulty in teaching a child to spread his napkin 
in his lap by reminding him of it after the meal has begun. Success 
in teaching the habit will be much easier by reminding to spread the 
napkin before the meal or just as sitting down at the table so that the 
act may become associated with the act of sitting down. 

The skilful teacher in history who is trying to link the past with 
the present will present first the historical incident before presenting 
the more familiar present condition. This is probably the typical 
method of teaching—to present the new word, the unfamiliar situa- 
tion, the strange incident, and then to refer it to something familiar— 
so that recent research in this case is giving a scientific basis to estab- 
lished practice. 

5. Simultaneous conditioned reflexes are formed more quickly. 
The new and the old should be presented as closely together in space 
and time as possible. If a new word is being learned, it should be 
presented adjacent to its synonym, the foreign word coming first if 
anything. If the association is between two visual stimuli, they should 
be placed as close together as possible so that the time interval between 
them is merely that represented by the time of the eye movement from 
one to another. This probably is an explanation of the value of a 
graphic representation for facilitating the portrayal of facts. It would 
seem as though the essential feature of effective presentation is the 
time necessary to make the comparison and spacial contiguity is a 
most important factor. This would suggest that visual presentation 
should in general be more effective than auditory presentation. The 
time interval in visual presentation is limited only by the speed of eye 
movement. The time interval in auditory presentation is of necessity 
longer because one word cannot be spoken until another is 
finished. Probably the most effective form of presentation is a 
combination of visual and auditory for then simultaneity can be 
approximated. 

6. The latent period of response (time interval between stimulus and 
response) in a conditioned response is the same as the interval between the 
two stimuli. When a child is learning to obey his mother’s commands 
every effort should be made to see that the effective (unconditioned) 
stimulus does not follow too long after the voice (conditioned stimulus). 
For instance, suppose that the child is about to touch a hot radiator. 
If the child is young and has not learned to respond to language, the 





408 The Journal of Educational Psychology 


mother’s “ Do not touch”’ will have no effect. If the mother sees the 
child going toward the radiator to touch it and says “Do not touch” 
and the child touches it and gets burned, then the next time the 
mother says ‘“‘Do not touch” the child will respond to her words by 
withdrawing even if the object is something other than a radiator. 
If the mother says ‘‘Do not touch” at the exact instant or a second 
before the child touches the radiator, learning takes place rapidly (5) 
and response to a subsequent ‘Do not touch” takes place immedi- 
ately. But if the mother says “Do not touch” a number of seconds 
before the radiator is touched, learning takes place slowly and sub- 
sequent reponses are made less promptly. In general where obedi- 
ence is being taught one should see to it in the early stages of learning 
that the response is made promptly following the request or command. 
It will be found helpful in this regard to verbalize familiar acts. If 
the mother talks in the form of commands while the baby is eating or 
doing other habitual activities, the habit of responding promptly to 
a command is more easily and surely learned. 

This principle also applies to punishments. Learning from touch- 
ing a hot radiator to avoid the radiator is a form of natural punishment. 
The unconditioned reaction is withdrawing following the stimulus 
burning. Since the two stimuli, burning and seeing, operate simul- 
taneously learning is immediate and effective. On the next occasion, 
withdrawing is an immediate response to seeing. But if a child leaves 
a toy out overnight and it becomes ruined, which also is a punish- 
ment, learning does not necessarily take place. In this case the 
unconditioned response is the response to the ruined toy. But the con- 
ditioned stimulus, going in at night, is too far removed to become 
an effective stimulus to taking in the toy. This learning, if it takes 
place at all, must be through the medium of language which provides 
substitute stimuli. 

In general punishments should follow immediately upon the 
act which is to be changed. 

7. The rate at which the conditioned reflex becomes established is 
correlated with the strength of the conditioned stimulus. In general 
in school, learning depends more than is usually realized upon clearness 
and vividness of stimuli. Much of the effectiveness in learning depends 
on good light, on clear print, good blackboard illustrations. The 
teacher’s voice and enunciation are also important factors. The baby 
who is learning to respond to its mother’s voice makes this learning 
more quickly if the voice is clear, loud, and distinct. The army has 
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learned the value of sharp, clear, emphatic commands as a factor in 
the speedy learning of the manual of arms and marching tactics. 

7a. A very strong conditioned stimulus delays the function of the 
reflex. ‘The reason for this is that too strong a stimulus probably 
calls out other responses, often emotional, which may interfere with 
the learning. For this reason light should not be blinding, or a teacher 
should not suddenly shout, as the shocks following these sudden 
unexpected strong stimuli are detrimental to learning. 

8. The viyor or strength of the conditioned response is proportional 
to the strength of the unconditioned stimulus. This law should be com- 
pared with the preceding law. The speed of learning is proportional to 
the strength of the conditioned stimulus; the strength of the learning is 
proportional to the strength of the unconditioned stimulus. New 
words in a foreign language are more surely learned if they are asso- 
ciated with the best-known synonym in English. Events and per- 
sonages in history or places and processes in geography are the better 
remembered if they are associated with the most commonplace or 
familiar or contemporary events, personages, or processes. This is the 
basis for the use of illustrations, models, diagrams, demonstrations, 
etc., in teaching. 

9. Discontinuous conditioned stimuli lead to more rapid learning than 
continuous conditioned stimuli. This, in part, is the basis of the prac- 
tice of spaced learning. 

10. The reflex resulting from two simultaneous conditioned stimuli is 
more powerful than when one is used. The law of summation. This law 
relates to the magnitude of the response. Illustrations of this are 
shown in experiments on distraction. Distractions, instead of lower- 
ing the quantity of mental work, often increase it. If noise, 
confusion, bustle, and the presence of other people are conditioned 
with certain reactions, then their presence increases the strength of 
those reactions. 

11. A conditioned reflex established to any given stimulus does not 
lead to formation of other conditioned reflexes. The law of specificity. 
This law is of great importance in education. It denies categorically 
the possibility of transfer. Transfer only takes place when a new 
conditioned reflex is formed, in the regular way. Apparent cases of 
transfer take place only because some tenuous, highly aerated aspect or 
feature of the situation is the conditioned stimulus. 

12. If a conditioned response is not exercised for a considerable 
period of time its strength is impaired. The law of forgetting. 
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12a. Though weakened through disuse conditioned reflexes will be 
found present after a lapse of time and can be brought to original strength 
in a much shorter time than ts required for the development of new reflexes. 

13. If a conditioned stimulus is applied several times in succession 
without the accompaniment of the unconditioned stimulus, the reflex 
undergoes rapid diminution in strength, and can in this way quickly 
be extinguished. The law of fatigue. This law has important educa- 
tional implications. The child who has formed a response to the 
' words “Don’t touch” will soon lose the conditioned response if the 
command is often repeated. So the teacher who repeats ‘ quiet,’’ or 
“attention” will soon find that her words have lost their potency. 
This law illustrates the ineffectiveness of nagging. The rule applies 
to all forms of learned activity. With continued application almost 
any learned activity can be dulled or extinguished by repeated appli- 
cation. Note that it is not mere repetition, but repetition without the 
accompaniment of the unconditioned stimulus that diminishes the 
strength of the reaction. With sufficient motivation fatigue does not 
take place. 

13a. The shorter the interval between the isolated applications of the 
conditioned stimulus, the quicker will be the extinction. 

136. Delayed and trace reflexes undergo a much more rapid process 
of extinction than the corresponding simultaneous reflexes. 

14. A conditioned reflex which has undergone such extinction, 
regenerates spontaneously after an interval of a few hours. This would 
seem to be another basis for spaced learning. 

15. Repeated extinction has a much more profound effect and the 
spontaneous regeneration is slower and less complete. When this has 
happened it may take a longer time to re-establish the response than even 
for the formation of a new response. Laws 13, 14, and 15 are the basis 
of the elimination of reactions by familiarity. A good example is 
the elimination of fear reactions by repeating the conditioned stimulus. 
If a child has learned to be afraid of dogs by a process of conditioning, 
this fear can be eliminated by the frequent presence of a dog, making 
sure that no other fear stimuli are present. ‘Familiarity breeds 
contempt” is a proverbial expression of thislaw. By a similar process 
we learn to ignore the disagreeable features of food, sounds, odors, etc. 

16. Extinction of a conditioned reaction for a particular stimulus 
has no effect upon the strength of other conditioned reflexes. Laws 
13, 14, and 15 form the basis of discrimination. When a con- 
ditioned reflex is formed there are besides the potent stimulus a 
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great variety of accessory stimuli. Some of these are irradiations in 
the same sense organ. (By irradiation is meant the response to 
similar modes of stimulation as to a spot on the skin near the spot 
originally used as a stimulus, or a tone within 200 or 300 vibrations of 
the tone originally used as a stimulus.) Some are merely coincident 
stimuli of temperature, light, movement of air, sound, etc. If these 
accessory stimuli are eliminated by repeated application without the 
accompaniment of the unconditioned stimulus, the conditioned reac- 
tion then is made to a very specific stimulus. In human beings this 
leads to highly discriminative responses which the writer has elsewhere 
called ‘“‘confacts.’”” These confacts may be verbalized and are then 
called concepts. 

17. If during or shortly before the occurrence of a condiiioned reflex an 
additional stimulus is presented, the conditioned reflex becomes weaker. 

17a. The stronger this additional or extra stimulus, the greater its 
inhibitory effect. 

17b. This inhibition is only temporary. A 

17c. On repetition of the extra stimulus its inhibitory effect becomes 
smaller. (The phenomenon is known as) zternal inhibition... The 
disturbing stimulus may be very slight yet wield its influence. An 
example of this is stage fright. New and strange situations may be 
quite effective in blocking or breaking habitual forms of response in 
young children. A new person in the house may upset altogether a 
child’s normal responses to its toys, its mother, and familiar objects . 

Internal inhibition has also been described in law 13. Discrimina- 
tion is a special case of internal inhibition in which reaction is learned 
to very specific stimuli. The latent period in trace and delayed 
reflexes also represents a form of internal inhibition. 

18. Inactivity which results from the repetition of the conditioned 
stimulus without the unconditioned stimulus (law 13) (also discrimination) 
and inactivity in delayed and trace reactions are in themselves conditioned 
reactions—a state of acquired inactivity. The different internal inhibi- 
tions undergo summation (law 10), they may be experimentally extin- 
guished (made to reappear), (law 13), the stimuli causing internal 
inhibitions may be discriminated from each other and they can be made to 
reappear by the application of an extra stimulus (law 17). This con- 
concept of inhibition as a positive form of reaction—a response by 
means of inactivity—is helpful. It indicates that inhibitions are 
learned and broken by exactly the same means that positive reactions 
are learned and broken. Much of conduct education is in the form of 












412 The Journal of Educational Psychology 


inhibitions. Children are taught to avoid putting injurious objects 
in their mouths, getting dirt on their hands and clothes, leaving articles 
about, crossing the street before stopping to look for automobiles, 
etc., etc. The application of laws 13 and 17 to the elimination of 
inhibitions is especially important as much of mental hygiene consists 
in the destroying of inhibitions. The art of the psycho-analyst seeins 
to be that of eliminating the unconditioned stimulus which per- 
sistently reinforces the conditioned reaction. 

19. If a stimulus has been positively conditioned to a certain reaction 
and another stimulus has been conditioned to the absence of that reaction, 
an application of the latter stimulus reduces the strength of the former 
stimulus. 

19a. The shorter the time interval between the two the greater the effect. 

As an illustration of this law consider the child who has learned to 
fear a dog. If the dog is held by the mother the fear reaction is 
lessened. In this case the mother as a stimulus does not produce 
fear. So the presence of the mother tends to lessen the fear reaction 
to the dog. (In like manner there is a slight tendency to condition 
the response of fear with the mother.) However, this law may be used 
in destroying positive reactions or inhibitions. 

20. Conditioned stimuli may serve as unconditioned stimuli in sub- 
sequent reactions to neutral stimuli. This indicates that the process 
of learning may go on to any length, the process of conditioning trans- 
ferring from one stimulus to another. 

21. The process of establishment of inhibitions, particularly the 
establishment of very fine discriminations leads to a disturbing effect on 
other reactions and inhibitions and if extreme will result in a general 
excitability and neurotic state. 

22. Internal organic states, as well as glandular secretion, influence 
the strength of conditioned reflexes and the rate at which they may be 
learned. 

23. Conditioned reflexes may be more rapidly established in young 
people than in old people. 

These laws, describing learning in terms of the conditioned reflex, 
offer a restatement of most of the recognized laws of learning. One 
law—the so-called law of effect—does not appear from the studies on 
the conditioned reflex. Although the law of effect expresses a true 
description of the learning process, it is probable that it is of a different 
nature from the laws above described which describe the formation 
of conditioned reactions. The law of effect seems to describe the 
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influence of one reaction on another, particularly the influence of 
reactions to peripheral stimuli on reactions to organic and visceral 
stimuli. Just as its name implies, this law represents the effect of one 
set of reactions on another set of reactions. Utilization of the law of 
effect in the control and guidance of learning is not so much a matter 
of understanding the way in which learning takes place as by the keen 
and skilful analysis of desires, motives, and drives, and the means by 
which they may be satisfied by skills and habits. 

The better control of education seems to imply two things, first 
the analysis of reactions into their constituent conditioned reactions. 
In other words there is need for more penetrating analyses of separate 
stimuli and responses in any form of learning. This analysis, which 
is often difficult to make, may sometimes best be approached by 
noting the possibilities for error or for aberration. A complete list 
of possibilities for error coincides with a complete list of the separate 
unitary reactions in the habit or skill which is being learned. Particu- 
larly important in the field of conduct education is an analysis of the 
combination of the different senses as fields of impression, and of the 
muscle groups as fields of expression. Outstanding among these is 
the relationship between the verbal, both as stimulus and response 
and other types of reaction. The second means of control of education 
is the application of the laws of learning to the facilitation of the forma- 
tion of these conditioned reactions into groups of habits or skills. 





ANALYSIS OF DISCREPANCIES BETWEEN TRUE- 
FALSE AND SIMPLE RECALL EXAMINATIONS 


H. L. ARNOLD | 


Humboldt State Teachers College, Arcata, Oalifornia 


Introductory—Within the past five years evidence has been accu- 
mulating to the effect that discrepancies exist between the true-false 
and the simple recall types of examinations (R1) (R2) (R3). The pur- 
pose of this paper is to report the results of an investigation which 
was conducted in an attempt to specify and analyze certain discrepan- 
cies. The data for this report were obtained from tests given to 
teacher-training classes in the Humboldt State Teachers College of 
Arcata, California. 


Part I 


Preliminary Investigation.—In order to obtain cues for the main 
investigation, a list of questions covering 100 recall points was pre- 
pared and labeled ‘‘ Test (a).”’ The material was taken from technical 
geography in which the subjects had had no homogeneous training. 

This list was re-cast in the form of a true-false examination in 
which the false statements were quite ridiculous, and was labeled 
“Test (b).”” The list was again re-cast in the form of a true-false 
examination in which the false statements were quite close to the 
truth, z.e., less ridiculous. This list was labeled ‘‘Test (c).”’ In each 
case, the number of true and false statements were equal. 

Samples of the Test. 

Test (a) 
Name the five largest cities of the world in order beginning with the largest. 
. Bound the state of Colorado. 


Name the states which have territory touching the west bank of the Missis- 
sippi river. 


gu 


Test (b) 


1. Sacramento is the largest city in the world...... 
18. North Dakota bounds Colorado on the south...... 
19. Oregon bounds Colorado on the west... ... 
30. Idaho touches the west bank of the Mississippi river...... 


Test (c) 


1. London is the largest city in the world...... 
18. Arizona bounds Colorado on the south...... 
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True-false and Recall Examinations 


19. Nevada bounds Colorado on the west...... 
30. Louisiana does not touch the west bank of the Mississippi river. ..... 


On a certain day “Test (a)” was given to section X which consisted 
of 30 pupils. The papers were collected at the end of 40 minutes as 
previously announced. ‘Test (b)” was distributed, and collected at 
the end of 20 minutes. All had finished before the time limit. On the 
following day tests (a) and (c) were given to section Y, the technique 
being the same as for section X. Section Y consisted of 34 pupils. 
All except 3 finished within the required time. No advice was given 
concerning guessing. The results follow: 


TaBLe I 


Section X—(False Statements were More Ridiculous) 


Tora Score MEAN SiagMa 
a ES ey oa 1181 39 13.6 
True-false ‘‘Test (b)”’ rights................. 2024 67 8.1 
True-false ‘Test (b)” (rights-wrongs)......... 1215 40 14.4 


Total false statements marked “true,” 412. 
Total true statements marked “false,”’ 397. 
Correlation between recall and true-false corrected, .437 (PE) .099. 


Section Y (False Statements were Less Ridiculous) 


Toraut Scor—E MEAN SIGMA 
go EST Et ee |e eer 852 25 10.2 
True-false ‘‘Test (c)’’ rights................. 1740 51 10.3 
True-false ‘‘Test (c)’’ (rights-wrongs)......... 356 10 15.7 


Total false statements marked ‘“‘true,’’ 973. 
Total true statements marked ‘‘false,” 411. 
Correlation between recall and true-false corrected, .414 (PE) .095. 


Note.—All negative scores were counted as negative in the above computa- 
tions. There were no negative scores in section X and eight negative scores in 
section Y. 


Conclusions from Part I.—(1) Testees tend to make higher scores 
on true-false tests than on recall tests when the false statements are 
quite ridiculous. 

2. Testees tend to make higher scores on true-false tests when the 
false statements are not ridiculous provided that scores are corrected 
for attenuation. 

3. There is a tendency for the examinee to mark more false state- 
ments “true” than true statements “‘false.”” This tendency seems to 
vary directly with the degree of discrimination required to distinguish 
the falsity of a statement. 
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4. In view of the fact that the groups tested were heterogeneous, 
the correlation between the recall and true-false tests are low. 
5. Standard deviations are increased by correction for attenuation. 


Part II 


Main Investigation—The test material in this investigation was 
taken from California School Law in which the subjects were receiving 
homogeneous instruction. During the fourteenth, fifteenth and 
sixteenth weeks of the spring semester of 1926 a class of 33 were tested 
as described below. 

A simple recall test of 100 points was prepared and divided into 
six tests containing 17, 17, 17, 17, 16 and 16 points. These tests were 
labeled Al, A2, A3, A4, A5 and A6 respectively. 

These points were listed in fixed order, and accordingly as each of 
100 cards, 50 of which were labelled ‘‘true”’ and the remainder “‘false,”’ 
were drawn from a receptacle a true or a ridiculous false statement was 
constructed for the point and placed in order in a true-false test. This 
test was divided into six tests at the same points as the recall test, and 
labeled B1, . . . B6. In like manner a series of tests C1, C2 . 

C6 was constructed in which the false statements were much less 
ridiculous than those in the “‘B”’ series. 


Samples of the Tests. 


“A” Series (SimpteE RECALL) 


8. List the sources of income for the Public Secondary Schools of the State of 
California. 

16. How many members in the County Board of Education? How are they 
elected or appointed? Length of term? Qualifications? Salary? 


“B” Series (TRUE-FALSE, RipicvuLovus) 


34. Fines collected from motorists is a source of income for the Public Secondary 
Schools of California. 
54. Members of the County Board of Education hold office during good behavior. 


“C” Series (Trvue-Fratsp, Less Ripicutovs) 


34. The Forest Reserve Fund is a source of income for the Public Secondary 
Schools of California. 

54. The members of a County Board of Education are elected for a term of four 
years. 


In order that practice effects might spread equally, the tests were 
given in the following order which exhausts the mathematical permuta- 
tions of the three series. 





ae 2° 45% £25 2A 2S 8S BRO BR OO OO lee (ee (ee (ee (ee (ee ee ee Oe lO 


26 2.6 £6 


Totals 


requir 








ae @ 


eo oO 


” 


oa. mM @ 


7) 


of 


ur 


re 





True-false and Recall Examinations 



















































































ScHEDULE 
Days 
Ist 2nd 3rd 4th 5th 6th 
Al A2 B3 B4 C5 C6 
Bl C2 A3 C4 A5 B6 
Cl B2 C3 A4 B5 A6 
The Results. 
Tas_e II 
P i Falses Trues 
Rights Righte-wrongs marked true || marked false Thorndike 
Pupil Inter- 
national 
sie Tt ~ B | Cc B | c B|c ais 
1* 62 86 73 73 52 4 17 9 4 88.4 
2 41 61 49 43 20 6 22 12 7 76 
3 57 73 66 51 35 || 12 23 10 5 93 
4* 47 77 62 67 46 4 12 6 4 67.8 
5* 66 82 73 72 53 5 15 5 5 63.2 
6 37 59 46 39 18 || 13 20 7 ~ 70.6 
7 48 71 62 44 25 || 15 20 12 8 51.6 
8 47 71 62 46 25 || 16 28 9 9 55.2 
9 55 79 70 59 48 || 11 17 9 5 47.8 
10* 62 78 74 67 58 4 7 7 9 52.6 
11* 72 80 75 78 67 2 7 0 1 85.8 
12 55 75 64 57 38 || 11 20 7 6 32.2 
13* 64 79 76 58 53 || 13 22 ~ 1 
14 54 86 69 74 39 7 22 5 8 35.4 
15 57 75 68 60 46 8 17 6 5 67.2 
16 50 75 61 66 35 7 22 2 4 65 
17 50 66 59 49 45 9 ~ 8 6 50.6 
18 62 82 68 66 43 9 21 7 4 52.2 
19* 69 87 78 77 63 4 9 6 6 56 
20* 57 80 78 64 56 5 17 11 5 47.4 
21 55 72 67 54 41 12 18 6 . 56.8 
22 51 71 55 63 41 5 10 3 5 86.1 
23 54 77 70 67 50 4 15 . 5 75.4 
24 38 68 63 37 27 || 17 25 14 11 36 
25 58 81 67 68 46 5 15 + 6 44.6 
26 63 85 62 74 34 7 20 4 8 49.6 
27* 55 77 64 61 32 7 21 9 11 48.2 
28 44 72 60 44 20 || 15 26 13 14 59.6 
29% 57 70 60 54 31 6 15 10 5 63.2 
30* 59 89 72 80 48 4 16 5 8 70.6 
31* 61 80 70 70 53 4 11 6 6 76.4 
32* 51 80 59 69 38 || 8 15 3 6 
33* 66 76 74 65 58 | 7 12 5 4 51.4 
Totals........ 1824 | 2520 | 2176 || 2016 | 1384 | 266 | 565 || 240 | 210 





Norz.—Students marked with an * were rated during the school year as studious if all 16 papers 
required were handed in completed and on time. 
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Rs he aid his ale Cnn ane hats dawn i emaatietitee 55 76 66 61 42 
RESETS Sp Ne an Gar et .64 87 .75 81 70 
(Brown’s formula).................ce0--- + .07 + .03 + .05 + .04 + .06 
EE aes iGh ckbcde cho sadsuaeseen ees | 8.31 6.9 7.6 11.7 11.8 
TaBLE III.—CorRELATIONS 

| B (rights) B (R-W) C (rights) C (R-W) 
ee Ts oe eae | .74 + .05 .76 + .05 .78 + .05 .85 + .03 
re rey Sree: .79 + .04 .76 + .05 .64 + .07 
“aE FIERCE cn iteeintes 63 + .07 | .72 + .06 
| errr | Astemadieehi! 2 eaten d etek stheeae au .86 + .03 














Considering the homogeneity of the group, none of the correlations 
between the recall and true-false tests are low. Since these correla- 
tions seem to concentrate around .76, the operation and influence of 
at least one other factor is evident. 


PARTIAL CORRELATION 


Let ri2 = the correlation between B and C (rights) 
Let ris = the correlation between B and A (rights) 
Let rz; = the correlation between C and A (rights) 


Then the partial correlation may be computed by the formula: 
Ti2 — T3123 

V (1 — ris)(1 — res) 

which gives the correlation between the two true-false tests when 


individual differences in recall are removed. Substituting and solving, 
we obtain. . 





T12.3 = 





TABLE IV 


Ti2.3 = .44— (series B rights, and series C rights) 
T12.3 = .22— (series B and C, rights-wrongs) 


Observations on Data.—(1) The results tabulated agree with those 
found by Toops, and Ruch and Stoddard (R3), in that correction 
for attenuation in a true-false test decreases the reliability of the 
test and increases the standard deviation. 

2. Table II shows a radical disagreement with the above mentioned 
studies in respect to reliability. Both studies cited, report of reli- 
ability of the recall examination as greater than that of the true-false, 
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while the present study finds the opposite result both when the 
scores are corrected and uncorrected for attenuation. 

3. The simple recall test correlates more highly with a true-false 
test when the latter is corrected for attenuation. 

4. There is a significant correlation between true-false tests when 
individual differences in memory are eliminated if the scores are not 
corrected for attenuation (Table IV). 

5. A true-false test is more reliable if, (a) the false statements are 
ridiculous to the extent that a group will miss approximately as many 
marked “‘true”’ as marked “‘false.”” (b) The scores are not corrected 
for attenuation. 

6. Examinees mark more than twice as many false statements 
“true” as true statements “‘false,’’ when the false statements are not 
ridiculous (see Table II). 

7. It is possible to construct a true-false test of the opposite or 
extreme nature where the great majority of points missed would be 
true statements marked “false.” 

8. Students who are more intelligent or studious tend to mark more 
false statements ‘‘true” than true statements “‘false’’ when the false 
statements are not ridiculous When the false statements are ridicu- 
lous the behavior of these students is reversed. 

Conclusions from Part II—(A) Discrepancies between simple 
recall and true-false tests. (1) Marked correlations between these 
types of examinations are not in evidence. 

2. Scores not corrected for attenuation on true-false examinations 
are higher in general than scores on simple recall tests. This is of 
no particular importance because it is more pertinent to establish a 
distribution of more reliable marks than a series of absolute but less 
reliable scores. 

3. If true-false scores are corrected for attenuation they will be 
less than the recall scores if the false statements require a high degree 
of discrimination. 

4. Wide divergencies exist between reliabilities of simple recall and 
true-false examinations. 

(B) Causes for discrepancies. (1) The relative reliabilities of true- 
false and simple recall tests vary as the groups tested become more 
homogeneous. The same is true of correlations. 

2. The correction of true-false tests for attenuation generally dis- 
sipates the results further. This seems to be because testees in their 


errors are influenced by psychological factors at least as much as by 
chance. 
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3. There is evidence that a true-false examination is a more reliable 
measure than a recall examination if the examinees have had homo- 
geneous instruction or training; otherwise (R3) the reverse is the case. 

4. True-false examinations as constructed by different investiga- 
tors, vary greatly between two extremes in respect to the ridiculousness 
of the false statements. This causes these tests to measure different 
abilities depending largely upon the individual examinee. 

5. The evidence points to the conclusion that the true-false exam- 
ination is a measure of a combination of the following functions: 
(a) Memorization of material. (b) Degree of discrimination. (c) 
Ability to resist suggestion. 

Consequently, the validity of the true-false examination as com- 
pared with the simple recall depends upon the objectives of the 
examiner. 
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WHY THE IQ IS NOT, AND CANNOT BE CONSTANT 


C. 8. SLOCOMBE 
Lincoln School of Teachers College 


In a recent article' the author discussed the results of an investiga- 
tion bearing on the constitution of mental tests that give a constant 
measure of intelligence. As the Binet tests and their modifications are 
probably the most widely used tests, it appears desirable to consider 
whether they do give a constant measure and if they do not, why 
they do not. 

The constancy of the IQ has occasioned much research and dis- 
cussion in America, but in afi cases the result has been unsatisfactory 
in that the IQ has been found relatively or approximately constant. 
All writers have apparently rested content with such approximation, 
without determining exactly the variation, or its causes. 

Any departures from constancy may be ascribed to two possible 
causes, both of which may operate: (a) the chance errors involved in 
all measurements, and (b) regular variations, having some determinable 
causes. It is proposed to consider particularly thelatter. The method 
of approach will be the same as that adopted in the article referred to 
above, and will consist in the critical examination of a set of published 
results.” 

Baldwin and Stechner tested 40 children each year for 6 years, 
commencing at the age of 6. Their results include a table of coeffi- 
cients, which is reproduced below. 


TABLE. I.—INTERCORRELATIONS OF SCORES IN STANFORD Binet Tests GIVEN 
AT YEARLY INTERVALS FOR Srx Years (N40) 





1 2 3 4 5 6 








1 85 73 77 81 81 
2 85 84 80 81 75 
3 73 84 91 83 79 
4 77 80 91 91 86 
5 81 81 83 91 94 
6 81 75 79 86 94 














1Slocombe, C. S.: Constancy of General Intelligence. British Journal of 
Psychology, Oct., 1926. 
2? Baldwin and Stechner: Results of Consecutive Stanford-Binet Tests. 
Journal of Educational Psychology, 1922. 
421 





tat Wh OR ee ee ee 
~ a © < a e - ¥ = 
=; Bity = > “ s St 
= le ‘ <*> > 
~ Ps : a - 
so 


6 ee rr BPs 
ae & 
Sa 
i ey 


* er ee 
= . bos x, 


422 The Journal of Educational Psychology 


It will be observed that there is a more or less regularlowering of 
the correlation toward the corners, ris and .a diagonal ridge of high 
correlations. If the average correlation is plotted against interval, it 
is found that the correlation varies inversely as the interval. Fig. | 
illustrates this inverse variation. (It is to be noted that the averages 
at long interval are obtained from only a few coefficients, and are 
somewhat unreliable. No notice 
by is therefore taken of the rise 
shown at an interval of five 
years. ) 

The ordinates are the aver- 
ages of coefficients parallel to the 
diagonal, riz, Tes, etc.,: Tis, 124, 
etc.: and so on. 

This demonstrated lowering 
of the correlation as the interval 
' 1. . * . lengthens indicates a progres- 

Interval in years. sively increasing change in the 

Fic. 1.—Showing that the correlation rank order of the subjects. 
between repetitions of the Stanford-Binet Before considering this further 
tests varies with interval. 

however, it is necessary to find 
whether the differences in correlation are significantly large. This 
will also indicate the cause of the change. 

For this determination the so-called Tetrad Differences criterion! 
of Spearman will be applied. All the Tetrad Differences have been 
calculated. Their distribution is shown in Fig. 2. 

The presence of large differences grouped around 5 PE proves that 
in addition to intelligence, and the specific factors, which may be 
regarded as chance errors of measurement, there are present group 
factors.2, (Those which are common to some tests but not to all.) 
An examination of the data reveals the fact that these large differences 
are due to high intercorrelation of early tests; high intercorrelation of 
late tests; accompanied by low correlation between early and late 
tests. 





Average Correlation. 

















1 Spearman and Holzinger: Sampling Error in the Theory of Two Factors. 
British Journal of Psychology, 1924. 

* Those readers who are interested in partial correlation methods may possibly 
see the group factors better by partialling out “‘g,” and examining the correlations 
remaining. For the method of doing this see Hart and Spearman, Mental Tests 
of Dementia, Journal of Abnormal Psychology, 1914. 
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Thus it would seem that there is a factor common to early tests, 
causing the high correlation, and another common to late tests causing 
high correlation. But the comparatively low correlation between 
early and late tests indicates that the two factors are not the same. 
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CONCLUSION 


In repetitions of the Stanford-Binet tests three factors enter; (a) 
the general intelligence which the tests attempt to measure, (6) acci- 
dental errors, or factors specific to each occasion of testing, and (c) 
factors which are common to a number of tests, but not to all. If 
intelligence or mental ratio is assumed to be constant, then the varia- 
tion shown by an inconstant IQ must be due to variation in these 
group factors. If the assumption be not admitted, without demon- 
stration, this study shows that these tests are incapable of affording 
proof. 

For the measurement in early years is of intelligence plus some fac- 
tor or group of factors, (apart from the specific factors, which cannot 
be eliminated), and the measurement in later years is of intelligence 
plus some factor or group of factors, which (again apart from the spe- 
cific factors) is different from the factor or factors present in the early 
tests. Thus conclusions drawn from comparison of scores at different 
ages are invalidated by the fact that the same thing is not measured at 
these different ages. 





















CRITICAL NOTE ON THE RELIABILITY OF A TEST 


KARL F. MUENZINGER 
University of Colorado 


The purpose of this paper is to review certain phases of the problem 
of measuring the reliability of a test, having in mind particularly the 
usual kind of college examinations. 

1. This problem is usually conceived to be similar to that of the 
reliability of a measuring device, say in the physical sciences, where we 
want to know the probable accuracy with which our measurement 
approaches the true magnitude of the object measured. Two difficul- 
ties present themselves immediately: (1) The magnitudes which are 
supposed to be measured by a test (the student’s achievement and 
understanding) are not fixed and stable, but are under the influence of 
factors commonly not controlled by us. (2) The standards (the exami- 
nations) that are to do the measuring are usually unrelated to objective 
standards and depend upon the judgment of the instructor; this means 
that they, too, are unstable. The fundamental fact of the instability 
of both objects and instruments of measurement will be the basis for 
the following critical discussion of certain statistical devices and 
concepts used in this field. 

2. The Reliability Coefficient.—If we give two comparable tests to 
the same class we can calculate the correlation 712 between the two sets 
of results. In such a case it is customary to call ri, the reliability coefi- 
cient of the type of tests given. What does this mean? 

(a) Statistically it means that knowing a student’s grade in test A 
and the correlation coefficient 7:2 we can, by means of the regression 
equation for test B and the standard error of estimation, predict that 
the chances are 2:1 that his grade in the second test will lie within a 
definite range of grades. 

This statistical concept is enlarged considerably by calling ri2 the 
reliability coefficient of test A or B. The assumption is introduced 
that all tests comparable with A will have intercorrelations that are in 
the neighborhood of 7:2. and that we can predict a student’s grade 
with a certain degree of accuracy for all succeeding comparable 
tests if we know his grade in A. While such an assumption has 
some validity in the case of an objective type of examination, its 
‘justification in the case of an old type of examination would be 
difficult to show. 
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Examinations of the old type are sometimes split into two “random 
halves” and by correlating the halves and the use of Brown’s Formula! 
the ri2 for the whole examination is found. This method introduces 
the further assumption that the two halves are comparable and that 
therefore any two examinations in a course are comparable and will 
have a correlation in the neighborhood of ri. Theoretically the 
reliability of a test can be ascertained if two testing situations are 
strictly comparable. If such comparability is assumed, what remains 
of reliability? 

(b) Psychologically interpreted r:. means, that there is a lack of 
correspondence between the two sets of grades in A and B, because we 
tap different sources of material and because subjective and objective 
conditions are not standard. In order to have some continuity of the 
subjective conditions the use of fore-exercise in the case of comparable 
tests or the splitting up of single tests has been recommended. But 
whether we follow one or the other method, the first cause still persists: 
The material to be measured lacks homogeneity. The subject-matter 
is recalled partly by rote memory and partly by logical memory. The 
retention itself is not uniform, depending on such familiar factors as 
amount of repetition, degree of interest, and previous experience. We 
know then that ri. expresses the degree of homogeneity of the material 
measured and the degree of standardization of objective and subjective 
conditions. If one calls it a reliability coefficient one must be aware 
that the reliability measured is primarily that of the teaching methods 
of the instructor and of the student’s retention and recall. 

The conditions of uniform modes of recall and of objective methods 
of scoring are met by the objective types of examination, which there- 
fore show a much higher r:2 than the old types. As regards the first of 
these conditions a somewhat similar advantage could be gained if 
instead of splitting an examination into chance halves one would 
match question by question if it could be found that certain questions 
are comparable as regards repetition and memory type. The riz of 
two such questions would then be similar, except as to objectivity of 
scoring, to the ri2 of the objective types of examination, and it would 
indicate besides subjectivity of scoring the selective character of 
retention. 





1 A critical discussion of the use of Brown’s Formula is found in Crum, W. L.: 
Note on the Reliability of a Test, etc. Ameiican Mathematics Monthly, Vol. 
XXX, p. 296. See also the reply by Kelley, T. L: Note on the Reliability 
of a Test, etc. Journal Educational Psychology, Vol. XV, 1924, p. 193. 
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3. The coefficient of alienation, k = +/1 — r?, has been defined 
as the factor that ‘‘measures the lack of relationship between two 
variables just as r measures the presence of relationship.’! It is 
further maintained that if we want to predict a second variable y 
for a certain value of a first variable x on the basis of their correlation 
rzy, then the error of estimate is still k times “‘as great as a sheer ran- 
dom guess.”? This is an unfortunate over-statement. Strictly 
speaking, k is the factor that reduces the standard deviation cy, of a 
variable y to the standard deviation c,.; of an array of y’s for a certain 
az; that is, cy.1 = oy\/1 — 12, Knowing only c, and M, (the mean), 
the accuracy of the estimate that any y has the value of M, is described 
by c,, the standard deviation or error. Since the mean is the measure 
having a smaller standard deviation (error) than any other measure of 
the total distribution® it is therefore quite an improvement over a 
sheer random guess. Now, if I also know r.,, then by means of the 
regression equation I can estimate y for a certain value of x, and this 
estimate has an error still k times as great as the error of using M, as 
an estimate. 

This phrase, ‘‘a sheer random guess,” which is not quite justifiable 
itself, is sometimes given a meaning which is even less justifiable. 
Ruch says that a correlation or reliability coefficient of 0.54, if com- 
pared with the coefficient of alienation, ‘‘reduces the error of predic- 
tion about 15 per cent over sheer guessing.”* ‘Then he shows in a 
footnote that by sheer guessing he means the chance assignment of 
grades. This is an inversion of the facts, for all we know at present 
is that the factor of chance is in the prediction of grades; it may not 
be at all in the assignment of grades. That is, even if we assumed 
perfect reliability of our measuring device, the examination, the lack 
of homogeneity of the material measured might account for any 
low 712 we are getting. It can only confuse the problem if we compare 
this causal factor with the random assignment of grades. Of course, 
as a matter of statistics, it is true that the coefficient of alienation is a 
measure of the chance distribution of a second variate as compared 
with the first, but in the case of two successive examinations we have 
no right to call the lack of correlation chance assignment of grades 
(or in this sense prediction of grades) since we are familiar with the 
factors causing a low correlation. 





1 Kelley, T. L.: Statistical Method, p, 173. 
2 Tbid., p. 174. 3 Tbid., p. 81. 
Ruch, G. M.: ‘The Improvement of the Written Examination,” p. 144. 
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Therefore, I cannot admit a statement like this, that ‘‘an examina- 
tion no more reliable than 0.50 to 0.60 has very small value in measur- 
ing the results of teaching.”! Such a correlation may show, if teaching 
methods were uniform, scoring objective, and the attitude of the stu- 
dents constant, that the two examinations covered a wide range of 
the subject so that the selective factor was the causal factor. In such 
a hypothetical case even a correlation of zero would not indicate that 
an examination has no value in measuring the results of teaching. 
Our accuracy of prediction would be low, to be sure, and while it is 
desirable to have greater accuracy in this respect, I do not think we 
should make accuracy of prediction a measure of the reliability of 
an examination. 

4. The concept of the true grade is used in reliability studies of 
examinations as an approach to a measure in which a single grade 
may vary from the “true” or “real” grade of a student. The true 
grade (similar to the true magnitude in physical measurements) is 
then supposed to be the average of many grades. It follows from the 
premises in (1) That an average of many grades cannot be a true 
grade since each grade measures a different magnitude. If we could 
test the same part of the subject-matter a number of times under 
similar conditions then we might with some propriety speak of the 
average of the tests as the true grade as far as this limited material 
is concerned. But if we test different parts of the subject-matter, 
then an average grade is not the true grade but merely a convenient 
device for summarizing the various results of different measurements. 
The question presents itself, whether a concept which is even theo- 
retically untenable can be helpful? Would it not lead to a clearer 
understanding of the problem if it were discarded entirely? 

5. Conclusion.—The value of reliability studies of examinations 
is undoubted. It is highly important that we be shown definitely 
how variable the results of our examinations are, and that —in regard 
to the new type of examination—objectivity in scoring where rote mem- 
ory is concerned reduces this variability considerably. In criticizing 
some concepts underlying reliability studies it has been my intention 
to contribute to a clearer interpretation of the results in the following 
respects: (a) Even comparable tests are not comparable as regards the 
homogeneity of the material measured. (b) The reliability coeffi- 
cient, besides being an indication of the consistency of testing condi- 
tions and of scoring, is primarily an indication of the consistency in 


1 Jbid., p. 144. 
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the material measured. This lack of consistency or homogeneity 
of the material depends largely upon variations in teaching methods and 
the selective character of forgetting. (c) The accuracy in predicting 
future grades on the basis of a single test is also a function of these 
factors. (d) In using the coefficient of alienation in indicating the 
chance factor involved in the reliability coefficient of a test, this chance 
factor must be understood strictly to refer to the prediction of grades 
on the basis of the regression equation as compared with the mean as 
prediction, and not to the chance element ingrading. (e) Theconcept 
of the true grade adds a meaning to the average of a number of grades 
which is theoretically untenable and which may be misleading. 
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How to Teach General Science, by J. O. Frank. Philadelphia: P. 
Blakiston’s Son and Co., 1926. Pp. XXII + 240. 


General Science, statistically non-existent in 1915, has had such 
unprecedented growth that recent Bureau of Education statistics not 
only place it at the head of the natural sciences in terms of pupil 
enrollment but accord it seventh rank among 44 subjects of the 
high school on the basis of the number of schools giving the course. 

The recruiting of adequately trained teachers of General Science 
to meet the demand presented a dilemma which has led to makeshifts 
among teachers of special science untrained in either the new point of 
view or the enlarged content of the new organization. This phase 
of the situation makes inservice training imperative and it is in this 
connection that this volume makes its widest appeal. 

The author develops the historical background and growth of Gen- 
eral Science in the early chapters and then assembles and interprets 
the gist of the large mass of literature that has appeared in the field, 
thus inducting the reader into the unique point of view of the subject 
with economy and effectiveness. 

The chapters on content, methods and materials of instruction 
are filled with concrete and practical suggestions of especial value to 
the classroom teacher. In his Special Teaching Aids, he has assembled 
the sources of valuable instructional materials particularly helpful 
to the beginning teacher. Perhaps a more extended, concrete treat- 
ment of the measurement of results, especially of classroom tests, 
would seem desirable. There is an extensive, selected bibliography. 

The book should prove useful to all teachers of General Science, 
and a valuable guide to the preparatory and inservice training of the 
inexperienced. JEROME G. KUDERNA. 

Lincoln School of Teachers College. 
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TEACHING ELEMENTARY ENGLISH 


Self-help Methods of Teaching English, by Julia H. Wohlfarth. World 
Book Co., Yonkers-on-Hudson, 1925. Pp. VIII + 294. 


The author says that this title ‘indicates that learning is a self-help 
process, and that only the kind of training that results in the self-help 
type of response recognizes the full significance of the word teaching.”’ 
The book is better than the title, whatever it may mean. Good prac- 
tical chapters deal with such matters as first steps in composition, the 
textbook, oral and written composition in Grades III to IX, and grade 
objectives. Many devices, illustrations, specific directions, and 
listings of cautions, suggestions, and important points are presented. 
Theoretical discussion is suppressed, and psychology, while adequately 
considered in the methods described, is not itself paraded. The treat- 
ment is not such as to constitute anything like a complete teaching 
guide, but what it includes is of real consequence for any teacher, 
new or old, of elementary school English. M. H. WI1iIna. 
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